Blocking Complicated URLs with Robots.txt - Folder Name
(Page 5 of 5 )
4.Blocking a particular part of the overall folder name
Examples of this include the following:
http://www.thisisasampledomain.com/(X(zjaksjjwsdjwjehrhejjdjhfhrhe))/folder/productinfo.aspx?id=201
http://www.thisisasampledomain.com/(X(tyntnrnendnfngnrnennwnswme))/folder/productinfo.aspx?id=205
http://www.thisisasampledomain.com/(X(yturnjfhdjwhdgdbvfvgcbdbsbae))/folder/productinfo.aspx?id=306
And depending on the site's purpose, there may be thousands of them. That would make it impossible to list them one by one in the robots.txt file. The correct approach is to identify a unique pattern.
Based on the above URLs, there is a particular part of the URL that is repetitive. This is/(X
However, since/(X is associated with different URLs and different query strings, it cannot be blocked using the ordinary robots.txt syntax. This means we must once again make use of regular expressions.
Since we are only interested in blocking all those URLs containing/(X , we can use this an exact match like:
User-agent: *
Disallow: /(X(*/
The above syntax will block all dynamic URLs beginning with /(Xsomewhere in the folder name. This is a very useful approach for big dynamic websites infected with massive duplicate content.
Important: Always test your robots.txt file using Google Webmaster tools before uploading it to your root directory to see if it blocks the URLs you intend to block and does not affect other URLs.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |