How Search Engines Handle the Meta Robots Tag and Robots.txt
(Page 1 of 4 )
A lot of questions still exist about the Meta robots tag and robots.txt despite the popularity of these terms with any search engine optimization professional. In this two-part series we will attempt to answer these questions scientifically by doing an experiment and collecting data.
These are no simple questions. For example, do other search engines behave the same way as Google when they crawl a page tagged with <meta name="robots" content="noindex, nofollow">? In other words, do they not index it? And does Googlebot follow a hyperlink on a page blocked by robots.txt?
These are just some of the teaser questions that might only require some “common sense” to answer, but may not actually be true in reality. Expert opinions push professionals to think like philosophers, so if someone asked “Is there any chance of indexing a page blocked by robots.txt?” the common sense answer is NO...but is this a fact? Is this true in all scenarios? Sometimes a fact is hard to believe, yet it remains a “fact” and not just an opinion. Facts come from actual testing.
Another difficult question that can be answered by common sense but may not be actually true is this: “Does Googlebot follow a link on a <META NAME="ROBOTS" CONTENT="NOINDEX"> page?" The common sense answer is no, since the page will not be indexed by Google. Again, this has not been thoroughly tested to determine if it is true in all cases, especially with different search engine bots.
This is the objective of this two-part tutorial, to let you know how search engine bots currently behave under different meta robots tag and robots.txt conditions. These two are highly powerful tools for SEO and web content security/privacy. They are used to prevent duplicate content, give some hint as to how search engine bots might crawl the pages, and prevent search engine bots from crawling sensitive pages not suitable for indexing or being shown in the search engine results pages.
The approach to finding the answers is to conduct a controlled experiment to test the behavior of search engine bots.
Next: Objectives of the Study >>
More Search Optimization Articles
More By Codex-M