Hiding Your Sensitive Data From Google and the World - Protect Your Data from Search Engines
(Page 3 of 4 )
Do Not Put Things on the Web You Wouldn't Like Your Competitors To SeeThis is a really simple tactic and it is flawless. If you do not put sensitive data on a server that is connected to the internet, then there is no way for Google to find it and index it. Your web server and online database are not the best hiding places for your sensitive data! It might be difficult if you have only one computer or if your server is connected to an intranet, but still the risks of exposing sensitive information are enough of a reason to buy a second computer or to physically separate your web server from the intranet. One piece of sensitive data that you should never put on a server, which is connected to the internet, is a file where you store passwords and user names. As you can see from the previous section, putting files with passwords and user names is a costly, yet common, mistake.
Configure the robots.txt FileThe properly configured robots.txt file is a very important tool to protect your server files and directories from being indexed by Google. Google respects the instructions in the robots.txt file, which cannot be said for some of the other search engines. There are some controversial issues around the robots.txt file; by listing which files and directories you do not want to be indexed and putting this listing on your web server, you place information about what you do not want people to see right in the hands of hackers. But in any case, it is much better to have a properly configured robots.txt file to tell the search engines what to exclude than not to have such a file and open up the information to anyone who can search for it.
Tip: Traditionally, the robots.txt file is configured to exclude files or directories and bots that reap e-mail addresses are forgotten. Since e-mail addresses are private information and the reaped e-mail addresses are most often used for spam, you may want to include an instruction to tell mail-reaping robots that they are not welcome.
There are many aspects to consider when configuring the robots.txt file, and a detailed explanation of how to do it is outside the scope of this article. Two good places where you can find more information about the robots protocol, the robots.txt file and all the things around them are: http://www.robotstxt.org/wc/robots.html and http://www.google.com/remove.html.
Next: Protect Your Data from Search Engines Continued >>
More Google Optimization Articles
More By Tsvetanka Stoyanova