beacon site | contact | (866) 488-3620
Andrea Cole

Robots.txt “Disallow” and “No Index” Meta Tag: What ‘s the difference?

If you are an SEO or are familiar with search engine optimization, the terms “Robots.txt” and “No Index” are somewhere in your vocabulary. If not, the explanation of these is fairly simple: both Robots.txt files and “No Index” meta tags are ways to keep search engines from reading and saving content to their database, known as their “index.” Why would you want to exclude pages from a search engine’s index? Another simple answer: To keep the engines from giving priority to unimportant pages at the cost of the good (i.e. converting) ones. So, let’s get into how the Robots.txt file and “No Index” meta tag operate.

The Robots.txt “Disallow”

Robots.txt is a file that you upload to your site’s root directory. Is it located at http://www.YourSite.com/robots.txt. In this file there are directions for search engines. When the file has a directive that says “Disallow” relating to a certain page, the search engine knows not to read that page. By telling a search engine not to read a page, you are giving it a signal that the page is not important and they will skip it. And for the most part, this will ensure that disallowed pages do not show up in search results.

However, “Disallow” means “do not read”, not “do not see.” Disallowing does not make pages invisible; it makes them not crawlable. If inbound links or citations exist to a disallowed page, search engines will still be aware of the disallowed page’s existence. It will simply be unaware of the content of the page. And, in the rare case someone does a search and there are no better results, a search engine will serve up a link to a disallowed page. The link will just be presented without a description.

(Also, as a side note, some smaller search engines don’t use the Robots.txt file. Therefore, disallowed pages will be crawled and indexed by them. )

The “No Index” Meta Tag

The “No Index” meta tag is piece of code that you put in the head section of your website. Unlike a “Disallow”, the “No Index” tag allows a search engine to read and see the page, but states explicitly that an engine should forget it ever saw the page once it is left. This instruction then also applies to any links and citations pointing to a “No Index” page: forget they exist.  Thus, the “No Index” meta tag prevents any occurrence of the page from being present in all indices in any form. Additionally, all search engines follow the “No Index” meta tag.

When to Use Robots.txt “Disallow” and When to Use Meta “No Index” Tag

In my opinion, the “No Index” tag is a more secure way of keeping pages out of an index. However, this method can also be harder to manage and keep track of since it’s applied on a page-per-page basis. The Robot.txt “Disallow,” on the other hand, is simpler to manage since it is one single file.

Every business should assess its own web needs, but for simplicity’s sake, the “No Index” meta is best used on pages you need 100% no index security on or are creating in secret from your competition. In all other cases, Robots.txt “Disallow” will do.

Does Robots.txt “Disallow” and “No Index” Meta Tag Consume Page Rank?

There has been much discussion over whether and how disallowed and “No Index’ed” pages consume Page Rank (“PR”). Here is the answer:  if you disallow a page but leave the incoming links to a disallowed page “do follow”, the disallowed page will still consume rank. And, because the page is disallowed and its outbound links are not read by engines, PR is wasted since it cannot be passed on. However, “No Index” can pass PR on if the links on the page are “do follow,” since an engine reads a “No Index” page but simply does not index it.

To learn more about Page Rank and controlling it, read my post on sculpting Page Rank.

7 Comments

  1. cracker
    Posted April 20, 2010 at 6:12 am | Permalink

    so experiments with my download site http://astalavista.ms showed that rel=nofollow doesn`t prevent PR to be passed with links. As fat as I remember it was announced on public in summer 2009

  2. Posted April 2, 2011 at 12:40 am | Permalink

    Yeah. My problem is How to disable the tags in my robots.txt. thanks.

  3. Posted April 7, 2011 at 1:43 pm | Permalink

    cracker:

    You make a good point. Even internally here at Beacon we’ve discussed this issue. To be cautious, I still use no follow, but I use it more to keep unimportant links and pages out of the index rather than as a PR sculpting mechanism.

    Ganeshbabu:

    You will need access to your server in order to update and edit your robots.txt file. There is no “disabling” rules in a robots.txt, per se. You would simply add or remove disalows to give or revoke search engine permission to certain pages.

  4. javid
    Posted April 2, 2012 at 5:03 am | Permalink

    I want to know when category pages and tags created duplicates and you say they should not be indexed why then all the big news sites and popular blogs index all of their tags?
    I can name http://technorati.com/blogs/top100 as an example, why their ranking in Google is good, is it useful for them too?
    Yours Sincerely

  5. Brandon
    Posted November 24, 2012 at 9:26 pm | Permalink

    Many thanks!!

    I was just struggling in how to explain the difference between Disallow and the noindex value in the robots meta tag.

  6. Posted October 17, 2013 at 11:11 am | Permalink

    THANKS for explaining this in a very no nonsense way. But I do have a question… Is there anyway to add the “noindex, nofollow” tag into the Robots.txt file along with the disallow statement? It just seems that this would be the best way to ensure that the page is ignored completely.

  7. Posted October 17, 2013 at 11:19 am | Permalink

    Thanks for the comment Johnny. Since the “noindex, nofollow” tag is a meta tag, it must be placed in the head section of your site. There’s no way for it to work in the robots.txt file.

One Trackback

  • By What is link juice (or link equity)? - Answered on February 13, 2014 at 2:46 am

    […] The nofollow tag is often used on internal links incorrectly in the desire to keep certain pages out of search results. This won’t work if an external link points to that site, or its in your sitemap. If this is what you are trying to achieve you should add a no index tag to the page in question or add it to your robots.txt file. More on that here. […]

Leave a Reply