If you are an SEO or are familiar with search engine optimization, the terms “Robots.txt” and “No Index” are somewhere in your vocabulary. If not, the explanation of these is fairly simple: both Robots.txt files and “No Index” meta tags are ways to keep search engines from reading and saving content to their database, known as their “index.” Why would you want to exclude pages from a search engine’s index? Another simple answer: To keep the engines from giving priority to unimportant pages at the cost of the good (i.e. converting) ones. So, let’s get into how the Robots.txt file and “No Index” meta tag operate.
The Robots.txt “Disallow”
Robots.txt is a file that you upload to your site’s root directory. Is it located at http://www.YourSite.com/robots.txt. In this file there are directions for search engines. When the file has a directive that says “Disallow” relating to a certain page, the search engine knows not to read that page. By telling a search engine not to read a page, you are giving it a signal that the page is not important and they will skip it. And for the most part, this will ensure that disallowed pages do not show up in search results.
However, “Disallow” means “do not read”, not “do not see.” Disallowing does not make pages invisible; it makes them not crawlable. If inbound links or citations exist to a disallowed page, search engines will still be aware of the disallowed page’s existence. It will simply be unaware of the content of the page. And, in the rare case someone does a search and there are no better results, a search engine will serve up a link to a disallowed page. The link will just be presented without a description.
(Also, as a side note, some smaller search engines don’t use the Robots.txt file. Therefore, disallowed pages will be crawled and indexed by them. )
The “No Index” Meta Tag
The “No Index” meta tag is piece of code that you put in the head section of your website. Unlike a “Disallow”, the “No Index” tag allows a search engine to read and see the page, but states explicitly that an engine should forget it ever saw the page once it is left. This instruction then also applies to any links and citations pointing to a “No Index” page: forget they exist. Thus, the “No Index” meta tag prevents any occurrence of the page from being present in all indices in any form. Additionally, all search engines follow the “No Index” meta tag.
When to Use Robots.txt “Disallow” and When to Use Meta “No Index” Tag
In my opinion, the “No Index” tag is a more secure way of keeping pages out of an index. However, this method can also be harder to manage and keep track of since it’s applied on a page-per-page basis. The Robot.txt “Disallow,” on the other hand, is simpler to manage since it is one single file.
Every business should assess its own web needs, but for simplicity’s sake, the “No Index” meta is best used on pages you need 100% no index security on or are creating in secret from your competition. In all other cases, Robots.txt “Disallow” will do.
Does Robots.txt “Disallow” and “No Index” Meta Tag Consume Page Rank?
There has been much discussion over whether and how disallowed and “No Index’ed” pages consume Page Rank (“PR”). Here is the answer: if you disallow a page but leave the incoming links to a disallowed page “do follow”, the disallowed page will still consume rank. And, because the page is disallowed and its outbound links are not read by engines, PR is wasted since it cannot be passed on. However, “No Index” can pass PR on if the links on the page are “do follow,” since an engine reads a “No Index” page but simply does not index it.