Get exclusive CAP network offers from top brands

View CAP Offers

Robots.txt Question

bingodude asked 4 years ago
I am having the following issue:

site.com/article
site.com/home/article

I have a few pages where the “home” is making a duplicate page. Would adding
Disallow: /home/

In my Robot.txt file be safe or would it block everything as “home” maybe an important directory. Stupid Joomla, stuff like this should not really been an issue to start with.

Any other ideas would be great.

Thanks all

5 Answers
arkyt answered 4 years ago
Yes adding Disallow: /home/ would block everything within that directory.

You might want to just block the files that are duplicating with “Disallow: /home/file.html” …

bingoadvantage answered 4 years ago
There is a specific meta tag designed for this situation. http://www.site.com/article” /> <-- this tells the search engines which is the correct page to index but i am unsure if it stops the alternate versions from bleeding link juice or factoring in other ways Regarding the robots.txt The problem is that the robots entry will stop the robots from visiting that page but it will still factor into things in a number of other ways. IMO your best solution, if possible, 301 redirect the /home/article pages to the /article pages. In theory you pass any pagerank and keyword relevance to the correct page this way as well.

Rak answered 4 years ago
@bingoadvantage 235701 wrote:

There is a specific meta tag designed for this situation. http://www.site.com/article” /> <-- this tells the search engines which is the correct page to index but i am unsure if it stops the alternate versions from bleeding link juice or factoring in other ways Regarding the robots.txt The problem is that the robots entry will stop the robots from visiting that page but it will still factor into things in a number of other ways. IMO your best solution, if possible, 301 redirect the /home/article pages to the /article pages. In theory you pass any pagerank and keyword relevance to the correct page this way as well.

Agree with bingo advantage here. Go the canonical URL route. It allows all your pages to be spidered – and allows your “main” page with the “duplicate” content to still be the only existence in google serps.

I ran into similar problems with wordpress a while back where my category page was ranking high then the actual article itself – canonical urls made sure that the article was the page that ranked in serps.

arkyt answered 4 years ago
@Rak 235713 wrote:

I ran into similar problems with wordpress a while back where my category page was ranking high then the actual article itself – canonical urls made sure that the article was the page that ranked in serps.

Had no clue how joomla worked so wasnt sure why it was creating duplicate content, and I have always used robots to disallow following of certain pages.

When you mention category page which gives an excerpt vs the actual artical I see where the issue might be the same with joomla. Started using WP in Dec 11 and never even thought of that! So how exactly did you work that out in WP?

EDIT >

Just checked and WP includes the canonical tag automatically on the post pages … maybe they didnt in an older version?

Rak answered 4 years ago
The latest versions of WP actually handle the canonical URL as part of its install. So every post created is given a canonical URL. Back in the day though, it wasn’t part of the package.. got to love WP.. helping everyone out!