<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SEO Chat &#187; Search Engine Spiders Help</title>
	<atom:link href="http://www.seochat.com/c/b/search-engine-spiders-help/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.seochat.com</link>
	<description>Search Engine Optimization News and Talk</description>
	<lastBuildDate>Tue, 18 Jun 2013 16:47:58 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=2013</generator>
		<item>
		<title>The Number of Google Results Found: What It Really Means</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/the-number-of-google-results-found-what-it-really-means/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/the-number-of-google-results-found-what-it-really-means/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 19:28:03 +0000</pubDate>
		<dc:creator>Ann Smarty</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Image Credits]]></category>
		<category><![CDATA[keyword]]></category>

		<guid isPermaLink="false">http://www.seochat.com/?p=1792</guid>
		<description><![CDATA[<p>You know the scenario well: you do a search on any engine, and in seconds you are given your results. You stay on the first couple of pages, probably within the first few websites offered. Because there is no way you are going to go filtering through the millions (or even billions) of websites that [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Ann Smarty' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=f8d69258525dec38624a29eb3d570d8c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong><a href="http://www.internetmarketingninjas.com/">Ann Smarty</a></strong></div><div class="ts-fab-description" style="margin-bottom:0.5em"><em><span>Community Manager</span> at <a href="http://www.internetmarketingninjas.com/"><span>Internet Marketing Ninjas</span></a></em></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"><a href="https://plus.google.com/103907915631843308004/?rel=author">Ann Smarty</a> is the pro blogger and guest blogger, social media enthusiast.</div><div class="ts-fab-footer"><a style="margin-right:1.25em" href="http://twitter.com/seosmarty">Twitter</a><a style="margin-right:1.25em" href="http://www.facebook.com/annsmarty">Facebook</a><a style="margin-right:1.25em" href="http://plus.google.com/103907915631843308004">Google+</a><a style="margin-right:1.25em" href="http://www.linkedin.com/in/annsmarty">LinkedIn</a></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/the-number-of-google-results-found-what-it-really-means/">The Number of Google Results Found: What It Really Means</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>You know the scenario well: you do a search on any engine, and in seconds you are given your results. You stay on the first couple of pages, probably within the first few websites offered. Because there is no way you are going to go filtering through the millions (or even billions) of websites that it gave you to choose from. I doubt you have ever even gone past the third page, much less ventured to number ten and beyond. What would be the point?<span id="more-1792"></span></p>
<p>If you did start to click on later pages, you would find out that the results they say they found aren&#8217;t actually there. Instead, you get a message saying that they have omitted around 90-99% of the websites that came up, and that message will be repeated on every page you try after the initial results they chose to include. What&#8217;s going on? Does that mean they don&#8217;t really have those results?</p>
<h2>What Search Query Results Really Mean</h2>
<p><img src="http://www.seochat.com/wp-content/uploads/2013/03/search-engine-results-found-01_zpsa7f919b6.jpg" alt="Search Engine Results" /></p>
<p>One of the reasons that engines like Google or Bing can find so many results is that they don&#8217;t bother to collect them for your use. The chances that you will need them are slim to none, and even if you did by chance require a deeper web page, you wouldn&#8217;t be able to find it. There are just too many to sift through. Instead, you would have to go back and narrow down your search terms to get a better list of choices, something we are all pretty used to doing by now.</p>
<p>But those results still do exist. The milliseconds it takes to conduct a search are that quick because it has sorted the most relevant sites based on your search terms. These use a ranking system to put them higher on the list, while other sites are left behind. If they were to offer you the billions of websites they have in their database that contain your search term, it would take a very long time to load them, indeed. Just imagine the less-than-one-second loading time for around 700 results. Not try to imagine applying that same time frame to<em> billions</em>.</p>
<p>Yeah, it wouldn&#8217;t work out very well for the average user that doesn&#8217;t want to spend months getting an entire internets worth of websites they will never need, never use and never be able to search through.</p>
<p>In other words, what the search results found number really means is how many mentions of that keyword they have in their database. Not how many results have been cultivated for your personal use from the search engine. Which is something I am sure we can all be thankful for.</p>
<p>Image Credits: <a href="http://www.flickr.com/photos/89165847@N00/6486021745/" rel="external nofollow">1</a></p><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Ann Smarty' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=f8d69258525dec38624a29eb3d570d8c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong><a href="http://www.internetmarketingninjas.com/">Ann Smarty</a></strong></div><div class="ts-fab-description" style="margin-bottom:0.5em"><em><span>Community Manager</span> at <a href="http://www.internetmarketingninjas.com/"><span>Internet Marketing Ninjas</span></a></em></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"><a href="https://plus.google.com/103907915631843308004/?rel=author">Ann Smarty</a> is the pro blogger and guest blogger, social media enthusiast.</div><div class="ts-fab-footer"><a style="margin-right:1.25em" href="http://twitter.com/seosmarty">Twitter</a><a style="margin-right:1.25em" href="http://www.facebook.com/annsmarty">Facebook</a><a style="margin-right:1.25em" href="http://plus.google.com/103907915631843308004">Google+</a><a style="margin-right:1.25em" href="http://www.linkedin.com/in/annsmarty">LinkedIn</a></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/the-number-of-google-results-found-what-it-really-means/">The Number of Google Results Found: What It Really Means</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/the-number-of-google-results-found-what-it-really-means/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Get a Page Count for Your Website</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/#comments</comments>
		<pubDate>Thu, 23 Aug 2012 00:00:00 +0000</pubDate>
		<dc:creator>Terri</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Mike Moran]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/</guid>
		<description><![CDATA[<p>How big is your website? Okay, for some of you, that might be a rather personal question, but even if you&#8217;d rather not tell me, you&#8217;d better know for yourself how big it is – especially if you think Google is underestimating your size.Perhaps a better question to ask is this: how do you know [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/">Get a Page Count for Your Website</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[How big is your website? Okay, for some of you, that might be a rather personal question, but even if you&#8217;d rather not tell me, you&#8217;d better know for yourself how big it is – especially if you think Google is underestimating your size.<br /><span id="more-587"></span><br /><p>Perhaps a better question to ask is this: how do you know that Google has crawled and indexed all of your website&#8217;s pages? If you&#8217;re a little guy, that&#8217;s easy; you can count that high. But those with bigger sites can easily lose track, as <a href="http://www.searchengineguide.com/mike-moran/how-many-pages-are-on-your-web-site.php" target="_blank" rel="external nofollow"><font color="#0000ff">Mike Moran</font></a> points out. And if you don&#8217;t at least have some kind of an estimate for the number of pages on your website, you don&#8217;t know if the search engines left any of them out of their indexes – or, indeed, just how many didn&#8217;t make it in.</p>
<p>I&#8217;ve been working at SEO Chat now for nearly eight years, and I can&#8217;t begin to even guess how many pages we encompass. I&#8217;m not proud of this, especially since I&#8217;ve written so many articles during that time (and edited most of the rest). I know I&#8217;m not alone in facing this kind of problem. Fortunately, Moran mentions four ways to get an answer to this burning question.</p>
<p>Perhaps the easiest way is to ask your webmaster how many pages your website contains. The fine folks in IT are pretty much required to know this information because of what they do every day pertaining to your website. Indeed, they&#8217;ve probably been asked this before, and have the answer close to hand.</p>
<p>If you don&#8217;t want to go to your webmaster, you can try going to your content management system (CMS). Most of these will tell you how many pages they are currently handling. If you use more than one CMS, make sure you query them all. Moran notes that “Even a free CMS such as WordPress can do this,” leaving you with few excuses for not collecting this information.</p>
<p>But suppose for some reason you can&#8217;t do this. You still need to know how many pages your website includes. There are programs that can help you with this. They will spider your site, rather like the search engines spider your site for indexing. A program such as <a href="http://home.snafu.de/tilman/xenulink.html" target="_blank" rel="external nofollow"><font color="#0000ff">Xenu</font></a>&nbsp;will even turn up pages that you might otherwise overlook.</p>
<p>And if worse comes to worst? Well, you can take a guess. Before you stare at me in shock, remember that “it&#8217;s better to&nbsp; hazard a guess than to just throw up your hands,” according to Moran. Why? Well, I have to assume that you still care at least somewhat about your finances if you&#8217;re running a business with a website – right? So, don&#8217;t you want to know what you&#8217;re spending? Of course you do. Unless you&#8217;re running some kind of hobby site as a labor of love with free hosting and unlimited bandwidth (and even that isn&#8217;t truly “unlimited” these days), then every website page costs money. If you can estimate the number of pages your website includes, you can better understand how much money you&#8217;re spending on it. As Moran points out, “A guess is better than nothing.”&nbsp; </p><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/">Get a Page Count for Your Website</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/get-a-page-count-for-your-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Engine Spiders: Why Don`t They Crawl My Site?</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/#comments</comments>
		<pubDate>Mon, 08 Nov 2010 09:00:13 +0000</pubDate>
		<dc:creator>Jamesp</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/</guid>
		<description><![CDATA[<p>So you have designed your site, created fresh content, checked off your search engine optimization checklist, and all bases seem to be covered. Yet when you check to see if your pages have actually made it into the search engines, you notice a few pages (or worse &#8212; all pages) are not there yet. Where [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Jamesp' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=708820d9e26e3480a6c19ed4a370c35f' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Jamesp</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/">Search Engine Spiders: Why Don`t They Crawl My Site?</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[So you have designed your site, created fresh content, checked off your search engine optimization checklist, and all bases seem to be covered. Yet when you check to see if your pages have actually made it into the search engines, you notice a few pages (or worse &#8212; all pages) are not there yet. Where did you go wrong? Never fear; this article will show you five common mistakes you can make when optimizing your site that might make spiders avoid it.<br /><span id="more-586"></span><br /><p>&nbsp;Below is a list of five reasons search engine spiders may not crawl your site:</p>
<ul>
<li>&nbsp; Flash or Java Links</li>
<li>&nbsp; robots.txt</li>
<li>&nbsp; Too Many Link on One Page</li>
<li>&nbsp; Links in Forms</li>
<li>&nbsp; Links in frames</li></ul>
<p><strong>Flash or Java Links</strong></p>
<p>While Google assures us that Flash is becoming more crawlable, for the time being, you will probably want to avoid relying on embedding links in Flash. If you do it, be sure to have a link outside of Flash on the page as well. The same goes for Java, Silverlight, and any other type of plug-in.</p>
<p><strong>Robots.txt</strong></p>
<p>I wrote an article a while back concerning the uses of robots.txt that may be helpful to check out if you are experiencing a lack of indexing. Basically, the robots.txt file tells the spiders that they may not access the information on a given page (ie; crawl the page). </p>
<p>There can be many reasons why you would not want a page to be available to the engines, but still want it accessible to certain individuals. For instance, you may wish to allow only certain people to view your portfolio page via a direct link, and not have it available to anyone with access to Google.</p>
<p><strong>Too Many Links on the Page</strong></p>
<p>For what should be obvious reasons, piling a million links on a page is a bad idea. For one, it is too confusing to the user. Second, it is usually a sign that you are a spammer or trying to put one over on the search engines. Spiders crawl maybe&nbsp; 100 or so links on any given page. Any more than that puts you at risk of not getting crawled.</p>
<p><strong>Links in Forms</strong></p>
<p>If you have users fill out a form prior to viewing a page, and that is the only way to access the page, be forewarned &#8212; it is not likely that the page will be crawled. The reason for this&nbsp;is, bots do not like to provide personal information, nor do they like to click on submit buttons. Call it bigotry if you like, but always be sure to include a link to your content that does not require a form to be submitted.</p>
<p><strong>Links in Frames</strong></p>
<p>Sometimes when you use frames on your page, Google, Yahoo, and Bing get confused by the structure of your site, and that confusion can lead to a traffic jam. Besides ruining your SEO, frames, to me, are just hideous anyway. Avoid using them to avoid this common pitfall.</p><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Jamesp' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=708820d9e26e3480a6c19ed4a370c35f' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Jamesp</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/">Search Engine Spiders: Why Don`t They Crawl My Site?</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/search-engine-spiders-why-dont-they-crawl-my-site/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Yahoo SLURP Crawler</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/#comments</comments>
		<pubDate>Tue, 08 Aug 2006 09:00:46 +0000</pubDate>
		<dc:creator>AkinolaAkintomide</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[Dynamic Page Indexing]]></category>
		<category><![CDATA[Getting Framed]]></category>
		<category><![CDATA[SLURP]]></category>
		<category><![CDATA[Yahoo Companion Toolbar]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/</guid>
		<description><![CDATA[<p>As SEOs and webmasters, we&#8217;re always looking for ways to get the search engine spiders to crawl our sites, and the deeper, the better. This article shows you how to target Yahoo&#8217;s crawler and convince it to stop by regularly.The search engine wars are fought with strategies, alliances, and robots. As Yahoo! primes itself to [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='AkinolaAkintomide' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=ab256d91cb8f36be7c5eb1143d9d1a2a' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>AkinolaAkintomide</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/">The Yahoo SLURP Crawler</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[As SEOs and webmasters, we&#8217;re always looking for ways to get the search engine spiders to crawl our sites, and the deeper, the better. This article shows you how to target Yahoo&#8217;s crawler and convince it to stop by regularly.<br /><span id="more-585"></span><br /><p>The search engine wars are fought with strategies, alliances, and robots. As Yahoo! primes itself to be the number one contender for market share after Google, websites that want to optimize for Yahoo must study how Yahoo ranks pages and how it indexes pages.&nbsp;The Yahoo web crawler SLURP should be studied; your site server logs should have recorded visits from various robots, including SLURP. If you do not have records of SLURP visiting your site, then this article will give tips on how to get SLURP to crawl (hopefully deep crawl) your site.</p>
<p><strong>The Preamble</strong></p>
<p>Yahoo SLURP evolved from&nbsp;Inktomi SLURP. The Yahoo SLURP robot is an upgrade from Inktomi’s SLURP. Yahoo used Inktomi’s search engine to replace Google, which used to take care of its search results. This officially triggered&nbsp;the second search engine wars (the first was won by Google without it declaring hostilities). </p>
<p>Yahoo has at least 130 million registered users on its network. Granted,&nbsp;Google is the&nbsp;definitive search engine, but Yahoo is large enough that it should not be ignored.</p>
<p></p>
<p>SLURP crawls websites, scans their contents and meta tags, and travels down the links contained on the page. It then brings back information for the search engine to index. Yahoo SLURP 2.0 stores the full&nbsp;text of the&nbsp;page it crawls in its memory and then returns&nbsp;to Yahoo’s searchable database. This is one of the semi-unique points of Yahoo SLURP;&nbsp;not all search engine crawlers store the entire text of the pages they crawl.</p>
<p>While SLURP has some features unique to it, it also obeys the robots.txt command. This&nbsp;command is very important since it ensures that you have control over which pages the crawler searches and indexes. This lets you protect the sensitive pages which you need to keep secure, pages which contain information you would rather not have in the hands of hackers (who regularly try and infiltrate search engines databases), or pages which you don’t want indexed at all (for whatever reason).</p>
<p>Another good thing about the robots.txt file is that it enables you to exclude specific robots, so you can inhibit the Googlebot but enable SLURP to crawl a particular page. This can be useful if you have optimized different pages for separate search engines. This may occur in order to give you flexibility, but a search engine may think you have duplicate pages and may penalize you. So careful use of the robots.txt file should&nbsp;definitely be on our list of how to make your website more search engine friendly. So how do you use the robots.txt file? You open&nbsp;notepad and type in the following lines:</p>
<p><font face="Courier">&nbsp; User-Agent: Slurp<br />
&nbsp; Disallow: whatsisname.html<br />
&nbsp; Disallow: page_optimized_for_google.html<br />
&nbsp; Disallow: credit_card_list.html<br />
&nbsp; Disallow: whatnot.html</font></p>
<p>Save it as robots.txt and upload it into your root directory. You can disallow as many pages for each crawler robot as you want, but to disallow certain pages for another crawler, you start a new line of code.</p>
<p><font face="Courier">&nbsp; User-Agent: Slurp<br />
&nbsp; Disallow: whatsisname.html<br />
&nbsp; Disallow: page_optimized_for_google.html<br />
&nbsp; Disallow: credit_card_list.html<br />
&nbsp; Disallow: whatnot.html<br />
&nbsp; User-Agent: Googlebot<br />
&nbsp; Disallow: page_optimized_for_yahoo.html<br />
&nbsp; Disallow: credit_card_list.html<br />
&nbsp; Disallow: whatnot.html</font></p>
<p>If you want to disallow all crawlers, you replace the name of the user agent with the wildcard command (*)</p>
<p>Robots.txt is useful for not getting banned on search engines and can also be used to pinpoint crawlers when they come&nbsp;calling.&nbsp;Only crawlers request Robots.txt, and these requests show up on the server logs.</p>
<p></p>
<p>Another way of shutting out SLURP is by using the noindex meta-tag. Yahoo SLURP obeys this command in the document&#8217;s head, and the code inserted in between the head tags of your document is</p>
<p><font face="Courier">&nbsp; &lt;META NAME=”robots” CONTENT=”noindex”></font></p>
<p>This snippet will ensure that that Yahoo SLURP does not index the document in the search engine database. Another useful command is the nofollow meta-tag. The code inserted is </p>
<p><font face="Courier">&nbsp; &lt;META NAME=”robots” CONTENT=”nofollow”></font></p>
<p>This snippet ensures that the links on the page are not followed.</p>
<p><strong>Dynamic Page Indexing</strong></p>
<p>This is the real charm of SLURP. Most search engine crawlers don’t bother crawling and indexing dynamic pages (.php, .asp, .jsp) since their content is subject to rapid change, which makes the process of&nbsp;indexing useless. Yahoo SLURP, however,&nbsp;does daily crawls in order to&nbsp;refresh the content on their indexed dynamic pages. It also does&nbsp;bi-weekly crawls which enables the search engine to discover new content and add it to its website incrementally. This enables a complex site&#8217;s URLs, generated by forms and content management software, to be indexed.</p>
<p>This frequent crawls show up in your server logs as frequent download requests, as the crawler moves, stops, and restarts.&nbsp;Yahoo says that these&nbsp;frequent download requests should not be a cause for alarm.</p>
<p>SLURP&#8217;s ability to index dynamic pages and to constantly refresh its content is a great relief to web designers (like me) who like having dynamic pages to enable fast loading and rapid updating. Websites which were not search engine friendly are&nbsp;suddenly&nbsp;in contention to be ranked number one.</p>
<p>However, the down side to this is that SLURP may never deliberately crawl your dynamic pages, unless you trigger the crawler via techniques which Yahoo encourages (to the benefit of their bottom line). </p>
<p><strong>Getting Framed</strong></p>
<p>Yahoo SLURP also has the ability to support frames, although it will not follow the SRC tag links to stand alone framesets; it only follows the HREF tags (as all good crawlers do).</p>
<p></p>
<p>After having said all this about&nbsp;Yahoo SLURP, there is now the little issue of getting your site crawled by this particular search engine spider. There are some ways to go about this task, and here we begin to see the inklings of what would be the order of the day in a search engine market dominated by Yahoo! (who seems to be very, very concerned about its bottom line).</p>
<p><strong>Linking</strong></p>
<p>The first strategy is good old linking; just get a link on a site on which Yahoo! regularly crawls, and voila. You have SLURP knocking on your door. This can be done by corresponding with a site which ranks well on Yahoo, or by submitting your web site to directories which SLURP regularly crawls (you can find these by searching for “directories” on Yahoo). If SLURP deep crawls (crawls lots of pages instead of just one or two) your site regularly, you&nbsp;have a high chance of getting a good ranking on the key word or topic for which you have optimized your site. </p>
<p><strong>Yahoo Companion Toolbar</strong></p>
<p>This is supposed to trigger the SLURP robot to crawl your site. And it also enables searchers to search within your site,&nbsp;offering value for your audience and attracting&nbsp;Yahoo SLURP as well.</p>
<p><strong>Sitematch</strong></p>
<p>This is done by paying Yahoo&#8217;s&nbsp;fees and submitting your site.&nbsp;This guarantees you will be added to the index (at a price) but is no guarantee of your website&#8217;s ranking in the SERPs.</p>
<p>This is a scary service, and some reviewers speculate that it is a foretaste of what site owners would face in a market dominated by Yahoo. It is carried over from Overture (which Yahoo purchased) and involves an annual fee for submitted pages. The URLs are&nbsp;submitted into Yahoo’s index and are then crawled by SLURP every 48 hours. </p>
<p>However, apart from the one off fee, there is a cost per click fee charged for each lead driven to your site (so you better have deep pockets)</p>
<p>Apart from&nbsp;SLURP visiting every two days, you also get listed on searches done on about.com, Excite, Overture and other Yahoo partners. However there is no guarantee of a high ranking, and frankly I do not like this method (because I absolutely love free stuff).</p>
<p>There is a way to submit&nbsp; your site for free,&nbsp; however Yahoo does not&nbsp;guarantee that websites submitted through such means will ever be crawled by SLURP.</p>
<p>By now you should know enough about SLURP to spot it,&nbsp;track it,&nbsp;attract it, and prevent it from crawling specific pages of your site.</p><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='AkinolaAkintomide' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=ab256d91cb8f36be7c5eb1143d9d1a2a' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>AkinolaAkintomide</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/">The Yahoo SLURP Crawler</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/the-yahoo-slurp-crawler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Search Engines Work (and Sometimes Don’t)</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/#comments</comments>
		<pubDate>Mon, 26 Dec 2005 09:00:46 +0000</pubDate>
		<dc:creator>Terri</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[Great Site]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[MSN]]></category>
		<category><![CDATA[URL]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/</guid>
		<description><![CDATA[<p>You know how important it is to score high in the SERPs. But your site isn&#8217;t reaching the first three pages, and you don&#8217;t understand why. It could be that you&#8217;re confusing the web crawlers that are trying to index it. How can you find out? Keep reading.You have a masterful website, with lots of [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/">How Search Engines Work (and Sometimes Don’t)</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[You know how important it is to score high in the SERPs. But your site isn&#8217;t reaching the first three pages, and you don&#8217;t understand why. It could be that you&#8217;re confusing the web crawlers that are trying to index it. How can you find out? Keep reading.<br /><span id="more-584"></span><br /><p>You have a masterful website, with lots of relevant content, but it isn’t coming up high in the search engine results pages (SERPs). You know that if your site isn’t on those early pages, searchers probably won’t find you. You can’t understand why you’re apparently invisible to Google and the other major search engines. Your rivals hold higher spots in the SERPs, and their sites aren’t nearly as nice as yours. </p>
<p>Search engines aren’t people. In order to handle the tens of billions of web pages that comprise the World Wide Web, search engine companies have almost completely automated their processes. A software program isn’t going to look at your site with the same “eyes” as a human being. This doesn’t mean that you can’t have a website that is a joy to behold for your visitors. But it does mean that you need to be aware of the ways in which search engines “see” your site differently, and plan around them.</p>
<p>Despite the complexity of the web, and dealing with all that data at speed, search engines actually perform a short list of operations in order to return relevant results to their users. Each of these four operations can go awry in certain ways. It isn’t so much that the search engine itself has gone awry; it may have simply encountered something that it was not programmed to deal with. Or the way it was programmed to deal with whatever it encountered led to less than desirable results.</p>
<p>Understanding how search engines operate will help you understand what can go wrong. All search engines perform the following four tasks:</p>
<ul>
<li><strong>Web crawling.</strong> Search engines send out automated programs, sometimes called “bots” or “spiders,” which use the web’s hyperlink structure to “crawl” its pages. According to some of our best estimates, search engine spiders have crawled maybe half of the pages that exist on the Internet.<br />
<br />
</li>
<li><strong>Document indexing.</strong> After spiders crawl a page, its content needs to be put into a format that makes it easy to retrieve when a user queries the search engine. Thus, pages are stored in a giant, tightly managed database that makes up the search engine’s index. These indexes contain billions of documents, which are delivered to users in mere fractions of a second.<br />
<br />
</li>
<li><strong>Query processing.</strong> When a user queries a search engine, which happens hundreds of millions of times each day, the engine examines its index to find documents that match. Queries that look superficially the same can yield very different results. For example, searching for the phrase “field and stream magazine,” without quotes around it, yields more than four million results in Google. Do the same search with the quote marks, and Google returns only 19,600 results. This is just one of many modifiers a searcher can use to give the database a better idea of what should count as a relevant result.<br />
<br />
</li>
<li><strong>Ranking results.</strong> Google isn’t going to show you all 19,600 results on the same page – and even if it did, it needs some way to decide which ones should show up first. Thus, the search engine runs an algorithm on the results to calculate which ones are most relevant to the query. These are shown first, with all the others in descending order of relevance.</li></ul>
<p>Now that you have some idea of the processes involved, it’s time to take a closer look at each one. This should help you understand how things go right, and how and why these tasks can go “wrong.” This article will focus on web crawling, while a later article will cover the remaining processes.</p>
<p></p>
<p>You’re probably thinking chiefly of your human visitors when you set up your website’s navigation, as well you should. But certain kinds of navigation structures will trip up spiders, making it less likely for those visitors to find your site in the first place. As an added bonus, many of the things you do to your site that will make it easier for a spider to find content, will often make it easier for visitors to navigate your site.</p>
<p>It’s worth keeping in mind, by the way, that you might not want spiders to be able to index everything on your site. If you own a site with content that users pay a fee to access, you probably don’t want a Google bot to grab that content and show it to anyone who enters the right keywords. There are ways to deliberately block spiders from such content. In keeping with the rest of this article, which is intended mainly as an introduction, they will only be mentioned briefly here.</p>
<p>Dynamic URLs are one of the biggest stumbling blocks for search engine spiders. In particular, pages with two or more dynamic parameters will give a spider fits. You know a dynamic URL when you see it; it usually has a lot of “garbage” in it such as question marks, equal signs, ampersands (&amp;) and percent signs. These pages are great for human users, who usually get to them by setting certain parameters on a page. For example, typing a zip code into a box at weather.com will return a page that describes the weather for a particular area of the US – and a dynamic URL as the page location.</p>
<p>There are other ways in which spiders don’t like complexity. For example, pages with more than 100 unique links to other pages on the same site can make them get tired with just one look. A spider may not follow each link. If you are trying to build a site map, there are better ways to organize it.</p>
<p>Pages that are buried more than three clicks from your website’s home page also might not be crawled. Spiders don’t like to go that deep. For that matter, many humans can get “lost” on a website with that many levels of links if there isn’t some kind of navigational guidance.</p>
<p>Pages that require a “Session ID” or cookie to enable navigation also might not be spidered. Spiders aren’t browsers, and don’t have the same capabilities. They may not be able to retain these forms of identification.</p>
<p>Another stumbling block for spiders is pages that are split into “frames.” Many web designers like frames; it allows them to keep page navigation in one place even when a user scrolls through content. But spiders find pages with frames confusing. To them, content is content, and they have no way of knowing which pages should go in the search results. Frankly, many users don’t like pages with frames either; rather than providing a cleaner interface, such pages often look cluttered.</p>
<p></p>
<p>Most of the stumbling blocks above are ones you may have accidentally put in the way of spiders. This next set of stumbling blocks includes some that website owners might use on purpose to block a search engine spider. While I mentioned one of the most obvious reasons for blocking a spider above (content that users must pay to see), there are certainly others: the content itself might be free, but should not be easily available to everyone, for example.</p>
<p>Pages that can be accessed only after filling out a form and hitting “Submit” might as well be closed doors to spiders. Think of them as not being able to push buttons or type. Likewise, pages that require use of a drop down menu to access might not be spidered, and the same holds true for documents that can only be accessed via a search box.</p>
<p>Documents that are purposefully blocked will usually not be spidered. This can be handled with a robots meta tag or robots.txt file. You can find other articles that discuss the robots.txt file on SEO Chat.</p>
<p>Pages that require a login block search engine spiders. Remember the “spiders can’t type” observation above. Just how are they going to log in to get to the page? </p>
<p>Finally, I’d like to make a special note of pages that redirect before showing content. Not only will that not get your page indexed, it could get your site banned. Search engines refer to this tactic as “cloaking” or “bait-and-switch.” You can check Google’s guidelines for webmasters (<a href="http://www.google.com/intl/en/webmasters/guidelines.html">http://www.google.com/intl/en/webmasters/guidelines.html</a>) if you have any questions about what is considered legitimate and what isn’t.</p>
<p>Now that you know what will make spiders choke, how do you encourage them to go where you want them to? The key is to provide direct HTML links to each page you want the spiders to visit. Also, give them a shallow pool to play in. Spiders usually start on your home page; if any part of your site cannot be accessed from there, chances are the spider won’t see it. This is where use of a site map can be invaluable.</p>
<p></p>
<p>I’ll assume that you are all reasonably familiar with HTML. If you have ever looked at the source code for an HTML page, you probably noticed text like this wherever a hyperlink appeared:</p>
<p><img title="" height="47" alt="" src="http://images.devshed.com/sc/stories/How_Search_Engines_Work/imageseo1.jpg" width="367" /></p>
<p>When a web browser reads this, it knows that the text “SEO Chat” should be hyperlinked to the web page http://www.seochat.com. Incidentally, “SEO Chat” in this case is the “anchor text” of the link. When a spider reads this text, it thinks, “Okay, the page http://www.seochat.com is relevant to the text on this page, and very relevant to the term `SEO Chat.’” </p>
<p>Let’s get a little more complicated.</p>
<p><img title="" height="75" alt="" src="http://images.devshed.com/sc/stories/How_Search_Engines_Work/imageseo2.jpg" width="269" /></p>
<p>Now what? The anchor text hasn’t changed, so the link will still look the same when the web browser displays it. But a spider will think, “Okay, not only is this page relevant to the term `SEO Chat,’ it is also relevant to the phrase `Great Site for SEO Info.’ And hey, there’s a relationship between the page I’m crawling now and this hyperlink! It says that this link doesn’t count as a ‘vote’ for the page being linked to. Okay, so it won’t add to the page rank.” </p>
<p>That last point, about the link not counting as a vote for the page being linked to, is what the rel=&#8221;nofollow&#8221; tag does. This tag evolved to address the problem of people submitting linked comments to blogs that said things like &#8220;Visit my pharmaceuticals site!&#8221; That kind of comment is&nbsp;an attempt by the commenter to raise his own website&#8217;s position in the search engine rankings. It&#8217;s&nbsp;called comment spam, by the way; most major search engines don&#8217;t like comment spam because it skews their results, making them less relevant. As you may have guessed, then,&nbsp;the “nofollow” tag in the “rel” attribute is specifically for search engines; it really isn&#8217;t there to be noticed by anyone else. Yahoo!, MSN, and Google recognize it, but AskJeeves does not support nofollow; its crawler simply ignores the nofollow tag.</p>
<p>In some cases, a link may be assigned to an image. The hyperlink would then include the name of the image, and might include some alternate text in an “alt” attribute, which can be helpful for voice-based browsers used by the blind. It also helps spiders, because it gives them another clue for what the page is about.</p>
<p>Hyperlinks may take other forms on the web, but by and large those forms do not pass ranking or spidering value. In general, the closer a link is to the classic &lt;a href=”URL”>text&lt;/a>, the easier it is for a spider to follow a link, and vice versa. <br />
</p><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/">How Search Engines Work (and Sometimes Don’t)</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/how-search-engines-work-and-sometimes-dont/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spider Guts</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/#comments</comments>
		<pubDate>Tue, 12 Oct 2004 09:00:56 +0000</pubDate>
		<dc:creator>DavidFells</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>
		<category><![CDATA[Inbound Links]]></category>
		<category><![CDATA[Link Popularity]]></category>
		<category><![CDATA[Title Tag]]></category>
		<category><![CDATA[URL]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/</guid>
		<description><![CDATA[<p>What&#8217;s inside the spiders? To get a good ranking in search engines, a good understanding of the fundamentals of SEO and how search robots crawl web pages is essential. The author includes valuable information such as a list of core elements considered by a typical search engine when calculating page relevance.In the quest for that [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='DavidFells' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=3b4c10383c7650df4b0233dcea26b1c8' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>DavidFells</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/">Spider Guts</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[What&#8217;s inside the spiders? To get a good ranking in search engines, a good understanding of the fundamentals of SEO and how search robots crawl web pages is essential. The author includes valuable information such as a list of core elements considered by a typical search engine when calculating page relevance.<br /><span id="more-583"></span><br /><P>In the quest for that elusive nirvana of search engine friendliness, we frequently find ourselves searching for &#8220;instant fix&#8221; ways to improve a page&#8217;s ranking without considering the big picture; that is, without looking at the problem of optimizing a web page as a whole and instead looking at several separate optimization steps as part of routine markup development or copy writing. While SEO experts do not tend to fit this mold, the average web developer certainly does. How many web pages have you &#8220;optimized&#8221; by simply adding keyword and description meta tags, and stopped right there? I imagine a hand count at this point would supply a fairly substantial number. An even better question at this point might be, &#8220;How many of you have tried to provide SEO for a web page without having even a basic understanding of search robot logic or what it expects to see in your pages?&#8221; Once again, I suspect we would have a healthy hand count. 
<H1></H1>
<P>The steps to optimize a page are well known to the SEO community, and many articles by authors far more knowledgeable than myself on the subject are available to web developers. So with all this knowledge out there, why are there so many developers who lack a big picture understanding of the subject? One word: fundamentals. It is crucial to know how the technology behind the scenes works, but like any other skill, the bulk of people attempting to learn that skill do not start at the bottom. They start somewhere that makes sense in trying to solve a particular problem and then they build up from that point. 
<P>If a developer held a greater understanding of the fundamentals of SEO and how search robots went about crawling web pages, they would in turn have a greater understanding of how to populate those alt attributes and meta tags. The objective of this article is to provide a general overview of how search robots (also called spiders) go about crawling and indexing web pages. 
<P> 
<P>
<P>There are a number of things that a spider expects to see when it looks at a web page, many of which are optional but still important in the big picture. The following is a list that describes the core elements considered by a typical search engine when calculating page relevance.</P>
<OL>
<LI><STRONG>Title Tag</STRONG> &#8211; The title tag should contain a title relevant to the page, not just &#8220;Home Page&#8221; or &#8220;Contact Us&#8221;. The title should be used for up to five keywords. <BR><BR>
<LI><STRONG>Headings</STRONG> &#8211; Search engines view &lt;h&gt; tags as terms of emphasis, meaning additional weight is given to terms that appear inside them. Keywords should appear in &lt;h&gt; tags. <BR><BR>
<LI><STRONG>Bold</STRONG> &#8211; Also viewed as terms of emphasis, but with less weight than headings. <BR><BR>
<LI><STRONG>Alt Text</STRONG> &#8211; Brief descriptive sentences should be used in image alt attributes. At least one keyword should appear in each alt attribute. <BR><BR>
<LI><STRONG>Keyword Meta Tag</STRONG> &#8211; Some engines use the keyword meta tags directly, some use them as part of a validation process ensuring that the keywords closely match the page content. The latter is the more typical scenario for modern engines. Keywords should be chosen carefully and be specific to the page they appear on. <BR><BR>
<LI><STRONG>Description Meta Tag</STRONG> &#8211; Most search engines use this tag in a similar fashion as the keyword tags. Each page should have a unique description. The description should contain a few keywords and briefly summarize the content that appears on the page with a high degree of accuracy. <BR><BR>
<LI><STRONG>Keyword Placement</STRONG> &#8211; Terms that are higher up on a page are more heavily weighted. <BR><BR>
<LI><STRONG>Keyword Proximity</STRONG> &#8211; Terms that are close together are probably related, and thus the site will show up in searches for those terms. <BR><BR>
<LI><STRONG>Comment Tags</STRONG> &#8211; Some search engines use comment tags for content, particularly in graphics rich/text poor sites.<BR><BR>
<LI><STRONG>Page Structure Validation</STRONG> &#8211; Proper coding is likely to be of better overall quality, and thus rewarded. <BR><BR>
<LI><STRONG>Traffic/Visitors</STRONG> &#8211; Search engines keep track of how many people follow their links. The more a link is followed for a given search, the more relevant the link is assumed to be. <BR><BR>
<LI><STRONG>Link Popularity</STRONG> &#8211; Also known as PageRank, this is a measure of how many web pages on the Internet link to your site and the relevance of those pages to the page they are linking to. The popularity of the linking site is also evaluated.<BR><BR>
<LI><STRONG>Anchor Text for Inbound Links</STRONG> &#8211; This is a measure of the relevance of the anchor text from the referring site. <BR><BR>
<LI><STRONG>Page Last Modified</STRONG> &#8211; Newer content is regarded as &#8220;fresh&#8221; and is treated as more relevant. <BR><BR>
<LI><STRONG>Page Size</STRONG> &#8211; Engines tend to weigh content at the start of a document more than content further down. If a page is too long, typically more than 50k in markup only, then it should be broken up into multiple pages. <BR><BR>
<LI><STRONG>Keywords in URL</STRONG> &#8211; URLs are considered important by engines. Use of hyphens rather than underscores in filenames and using keywords in filenames and directories improves a pages potential relevance. </LI></OL>
<P>These elements are all poured into an algorithm by the search engine that produces a very specific result: a relevance score for a page based on a given set of keywords. Evaluating page relevance is a constant reciprocal process that involves crawling around all pages indexed by a particular engine and evaluating the relevance of their content and the relevance of references to that content. The items listed above are things search engines expect to find in a page as well as factors that are not necessarily expected, but are considered if available (such as inbound links).</P>
<P></P>
<P>The next set of variables weighed by search engines are negatives. These will negatively effect the performance of a page on a search engine without exception. Avoiding these items is crucial to reaching and, more importantly, maintaining a high rank on a search engine.</P>
<OL>
<LI><STRONG>Broken Links</STRONG> &#8211; Internally or outgoing, search engines do not view pages with broken links as pages with fresh content, and are going to be scored as less relevant for their keywords. <BR><BR>
<LI><STRONG>Spam</STRONG> &#8211; This refers to any attempt to trick a search engine, such as using irrelevant keywords to draw extra hits, placing invisible content on the page to boost keyword density, and using meta refreshes (often in combination with irrelevant keywords) to draw a user in for an irrelevant search and then direct them to the page you want them to see. These techniques can result in a ban from search engines if they catch them. <BR><BR>
<LI><STRONG>Excessive Search Engine Submittal</STRONG> &#8211; Over submitting a site to a search engine will likely result in a ban. Submit no more than once every three months, according to Google. <BR><BR>
<LI><STRONG>Empty Alt Attributes</STRONG> &#8211; Empty alt tags is a major accessibility issue as well as just poor coding, and will affect a page negatively. <BR><BR>
<LI><STRONG>Excessive Punctuation</STRONG> &#8211; Excess punctuation in the Title and Description tags wastes valuable space and may cause a problem with the engine. <BR></LI></OL>
<P>These negative factors could greatly effect an otherwise relevant page, of course some of them preclude the page actually being relevant, particularly spam. The biggest pitfalls for an otherwise optimized page is simple typographical errors, broken links (usually due to stale content) and oversight in markup. Simple mistakes could mean the difference between top ten and top fifty for a search on an engine, a difference that could mean thousands of dollars per day in lost revenue for many websites. </P>
<P>Imagine if a site like Amazon.com failed to use alt attributes and stopped using &lt;h&gt; tags (replaced by images, for example). Searches that would typically show the site as a number one result could start bringing the site up as a number fifty result.</P>
<P><STRONG>Conclusion</STRONG></P>
<P>Approaching SEO as a holistic process rather than simply a combination of steps is critical. It is simply not enough to use an effective Title tag on every page and stop, or to use keyword relevant URLs and then stop. To achieve and maintain top rating on all pages for the appropriate sets of keywords, a page must be optimized completely for the way search engines weigh content relevance, and that involves taking everything discussed earlier in the article seriously.</P><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='DavidFells' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=3b4c10383c7650df4b0233dcea26b1c8' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>DavidFells</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/">Spider Guts</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/spider-guts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Score One for the Spiders?</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/#comments</comments>
		<pubDate>Mon, 10 May 2004 00:00:00 +0000</pubDate>
		<dc:creator>Terri</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/</guid>
		<description><![CDATA[<p>Spiders. Those creepy, crawlies of data mining that scour the web for bits and pieces of information have a habit of getting into trouble. Just ask eBay and Boats.com, who recently had to resort to some legal bug spray in order to get rid of the little pests. Is your data scavenging in danger?Call them [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/">Score One for the Spiders?</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[Spiders. Those creepy, crawlies of data mining that scour the web for bits and pieces of information have a habit of getting into trouble. Just ask eBay and Boats.com, who recently had to resort to some legal bug spray in order to get rid of the little pests. Is your data scavenging in danger?<br /><span id="more-582"></span><br /><P>Call them spiders. Call them robots. Call them bargain hunters (or one heck of a nuisance); they&#8217;re software programs with a mission: to hunt down information and bring it back. Many search engines couldn&#8217;t live without these electronic assistants to help them keep track of the proverbially explosive growth of the Web. Certainly they can save a lot of time when you&#8217;re trying to comparison shop online &#8212; just let a spider do the hunting and bring back the results. This is all well and good, unless the owner of the site doesn&#8217;t take kindly to spiders. eBay won an injunction against Bidder&#8217;s Edge that forced BE to stop sending spiders to eBay&#8217;s Web site in search of auctions. But a somewhat similar-looking court case was just recently decided in favor of the spider-wrangling plaintiff. Does this have wider implications for information aggregators who use spiders? </P>
<P>First, let me give my disclaimer: I am not a lawyer, nor do I play one on TV. So check with someone who eats, drinks, and breathes this stuff before you do anything drastic.&nbsp; That said, let&#8217;s take a look at the cases at hand.</P>
<P>The more recent case was just decided early in April, in a district court in Florida. It involves two companies with Web sites that list yachts for sale&#8230;which may in part explain why this case did not attract the kind of attention that the earlier eBay case did. (There are&nbsp;more people interested in buying Beanie Babies than buying big boats.) Anyway, the older Web site in this case is owned by Boats.com, who, for the past nine years, has owned and operated Yachtworld.com, a Web site on which yacht brokers could post information about the big boats they have for sale &#8212; sort of like electronic classified ads, with more interactivity. Enter Nautical Solutions Marketing, in 2001, with their Web site, Yachtbroker.com &#8212; and two services that Boats.com complained blow them right out of the water.</P>
<P></P>
<P>The first service is the more interesting one for our purposes (though they&#8217;re both relevant) because it&#8217;s that service which directly involves the use of a spider. Cleverly named &#8220;Boat Rover,&#8221; this software program would connect to a targeted Web site, extract specific facts about a yacht for sale from a public yacht listing, collect those facts, and enter them in a searchable database. Boat Rover did not hold onto the HTML used in the listing; it copied it just long enough to get the facts, then discarded it. Yes, this matters &#8212; because this case was dealing with a question of copyright infringement. Specifically, did NSM infringe any copyrights when it sent Boat Rover to Yachtworld.com repeatedly between November, 2001, and April, 2002, to collect information which it then posted on its own Yachtbroker.com Web site?</P>
<P>Not according to Judge Merryday, who presided over the case. I&#8217;m sure you&#8217;d love to hear that it was a simple &#8220;no,&#8221; thus reaching a decision once and for all in favor of spiders running rampant, but anyone who&#8217;s ever had reason to consult a lawyer knows how unlikely that would be. Legal cases involving high technology, especially the Internet, tend to be less cut-and-dried. In this case, two important points were raised in consideration of whether any copyrights were infringed. First, what kind of information was Boat Rover collecting? In this case, it was just the facts: manufacturer, model, length, year of manufacture, price, location, and the URL of the Web page that contained the yacht listing. Well, fortunately for NSM (and Joe Friday), there&#8217;s an existing precedent that states that facts cannot be copyrighted; facts are part of the public domain, and thus there can be no question of copyright infringement. There&#8217;s no copyright to infringe!</P>
<P>The second aspect of this service that might have opened NSM up to charges of copyright infringement involved displaying these listings on its Web site. Remember when I mentioned that Boat Rover&#8217;s discarding the HTML information was important? That meant that NSM had to code the information without using Boats.com&#8217;s coding as a template &#8212; in theory, at least. And in fact, Judge Merryday found enough differences between the &#8220;look and feel&#8221; of a Yachtworld.com listing and a Yachtbroker.com listing to keep NSM in the clear over possible copyright infringement. The judge wouldn&#8217;t even grant Boats.com a copyright on its use of descriptive headings like &#8220;electrical,&#8221; &#8220;accommodations,&#8221; and &#8220;galley&#8221; to describe specific features of a yacht &#8212; because the terms were industry standards, and at least two other yacht brokering Web sites were using those terms in the very same way. The legal reasoning seems to go something like this: ideas themselves do not receive copyright protection. Expression of a particular idea does. However, in some cases, the ways to express a particular idea are so limited that the expression doesn&#8217;t receive copyright protection, because that would be just like protecting the idea. In this case, there&#8217;s only so many headings you can use for various areas of a yacht when listing it for sale, so &#8212; like facts &#8212; they don&#8217;t receive copyright protection.</P>
<P>The second service that NSM offered its customers was a &#8220;valet service.&#8221; With the permission of a yacht broker using the service, NSM would move, delete, or modify the yacht broker&#8217;s listing. Boats.com complained that NSM&#8217;s people were copying and pasting listings from Yachtworld.com over to Yachtbroker.com. The court found strong enough evidence that the only items being cut and pasted were descriptions and pictures, not HTML or anything that Boats.com could claim a copyright to. In fact, the court found (and both NSM and Boats.com agreed) that the yacht brokers themselves owned the copyrights to their own listings &#8212; and, since NSM was doing its thing with their permission, they weren&#8217;t infringing any copyrights.</P>
<P>Interestingly, this case was a &#8220;pre-emptive strike;&#8221; NSM was the plaintiff. They brought the suit seeking a declaration that they did not infringe any copyright owned by Boats.com. And they won, too.</P>
<P></P>
<P>I imagine you can all see how this has relevance for spiders and those who aggregate information about specific products&#8230;or even auctions. So, does this mean that spiders can go out and disregard those software-based warnings from Web sites telling them that they&#8217;re not welcome? Does this take eBay&#8217;s winning of an injunction against Bidder&#8217;s Edge and turn it on its head? </P>
<P>You&#8217;d think it would. On the face of it, the two cases look similar. Bidder&#8217;s Edge, an auction aggregation site, was sending spiders to eBay to collect information on eBay&#8217;s auctions. BE would then aggregate eBay&#8217;s auction information along with auction info from many other auction Web sites, making it a &#8220;one-stop shop&#8221; for that kind of information. eBay wanted them to stop &#8212; and, in fact, was able to legally force them to stop.</P>
<P>That is where the similarities between the two cases end. You see, eBay didn&#8217;t claim that Bidder&#8217;s Edge was infringing its copyright &#8212; or, at least, the court didn&#8217;t grant the injunction based on a claim of copyright infringement. Oh no. eBay claimed that Bidder&#8217;s Edge was trespassing! Bidder&#8217;s Edge admitted that it was sending 80,000 to 100,000 queries a day to eBay&#8217;s Web site, and eBay argued that that was akin to sending 80,000 to 100,000 robots into a bricks-and-mortar business looking for prices and not buying anything. Well, the court was skeptical of that argument, but not of certain other arguments. Specifically, eBay was able to point out that Bidder&#8217;s Edge&#8217;s spiders were using up eBay&#8217;s computer capacity, after eBay had told BE more than once that it was not welcome to use its spiders on eBay&#8217;s Web site. At that time, it was only using less than two percent of eBay&#8217;s capacity &#8212; but the principle in this case wasn&#8217;t the amount of capacity it was using up, but that it was using that capacity after eBay told it not to. A Web site might not actually be real estate in the same way that land is &#8212; but there is no question that computers and servers are property that can be owned. And Bidder&#8217;s Edge was using eBay&#8217;s computers in ways that eBay specifically told it that it wasn&#8217;t allowed to do. What would you do if you told someone not to use your computer in a particular way and they kept doing it? Uh-huh, that&#8217;s what I thought. In eBay&#8217;s case, they were granted an injunction against Bidder&#8217;s Edge in late May 2000.</P>
<P>So what does Nautical Solutions Marketing vs. Boats.com add to the sometimes-controversial issue of spiders? Well, first off, just because something was copied from a Web site &#8212; even by a robot &#8212; doesn&#8217;t mean that the person or company doing it is committing copyright infringement. Especially if it&#8217;s essentially factual information. On the other hand, that also doesn&#8217;t mean that they&#8217;re not in hot water. If you&#8217;re a spider wrangler, make sure your spiders check to see whether they&#8217;re welcome on the Web sites they visit &#8212; and don&#8217;t try to force the issue. It&#8217;s a big Web out there; there&#8217;s plenty of cyberspace for spiders to crawl without looking for trouble.&nbsp; </P><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='Terri' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=b45ff58a165dd9c241f7fb37acf4641c' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>Terri</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/">Score One for the Spiders?</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/score-one-for-the-spiders/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Protect Against Invaders by SPAM-Proofing Your Website</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/#comments</comments>
		<pubDate>Wed, 05 May 2004 00:00:00 +0000</pubDate>
		<dc:creator>seo_admin</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/</guid>
		<description><![CDATA[<p>Benjamin Pfeiffer discusses how to SPAM-proof your website. He explains how to use Javascript and mod_rewrite to stop SPAMbots and Spybots from finding email addresses on your website. He also talks about how to find&#160;and set up the .htaccess file and gives examples of robots and how to block them.Despite recent improvement in tools and [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/">Protect Against Invaders by SPAM-Proofing Your Website</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[Benjamin Pfeiffer discusses how to SPAM-proof your website. He explains how to use Javascript and mod_rewrite to stop SPAMbots and Spybots from finding email addresses on your website. He also talks about how to find&nbsp;and set up the .htaccess file and gives examples of robots and how to block them.<br /><span id="more-581"></span><br /><P>Despite recent improvement in tools and programs in the battle against SPAM, most of us cannot escape the menace that plagues most of our inboxes on a regular basis. Each day most of us probably receive more SPAM than actual real email, and with Spammers getting more and more creative in their ways to circumvent traditional anti-SPAM tactics, it&#8217;s&nbsp;vital webmasters empower themselves with some anti-SPAM tactics for their own websites. </P>
<P>In this article I will discuss a&nbsp;few ways to SPAM-proof your website against malicious SPAM robots that inevitably collect your email to be sold by the thousands to Spammers worldwide, whether it be for using your information inappropriately, or simply for no-good reasons.&nbsp; These tactics are so effective that within a month of implementing them, you should see a dramatic drop in the amount of SPAM that makes it through to your website email addresses, not to mention a decrease in bandwidth.</P>
<P><STRONG>How to Stop SPAMbots Dead in Their Tracks</STRONG></P>
<P><STRONG>1. Using JavaScript<BR>2. Using Mod_Rewrite</STRONG></P>
<P>Both of these techniques are effective in blocking SPAMbots and Spybots from finding your email address or other personal information on your website. While JavaScript is an easier solution, using <FONT face="Verdana, Arial, Helvetica, sans-serif">mod_rewrite</FONT> to block SPAMbots is more technical and requires knowledge of editing your <FONT face="courier new, courier, mono">.htaccess</FONT> file. It&#8217;s best to try the JavaScript method first, and then venture into using mod_rewrite to further block SPAMbots from hitting your website. </P>
<P><STRONG>Using JavaScript</STRONG></P>
<P>To understand how to use JavaScript to block SPAMbots from harvesting your email, let&#8217;s examine the ways that they find your email in the first place.</P>
<P><STRONG>1. Mailto: Links</STRONG> &#8211; these are common links placed in the HTML code of a website, offering a potential visitor the ability to send an email to the webmaster of the site.&nbsp; A visitor clicks on the email link and it opens an email client with the To: field already filled in with the address specified in the code.&nbsp; These links are the prime target of SPAMbots harvesting your email address, and simple use of JavaScript can cut down on email harvesters hitting your inbox with SPAM.&nbsp; The main objective with using JavaScript is to change the appearance of your email address so that email harvesters do not recognize your email, but still retain complete functionality for legitimate visitors to send you an email.</P>
<P><STRONG>2. Contact Forms</STRONG> -&nbsp;this is&nbsp;another prime location for SPAMbots to leave their tracks, steal your email address and be gone, ready to report back with fresh email addresses.&nbsp; These forms are another common feature on websites, and the following is what most often causes SPAMbots to find your email. </P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">&lt;input type=&#8221;hidden&#8221; name=&#8221;recipient&#8221; value=&#8221;support@example.com&#8221;&gt;</FONT></P></BLOCKQUOTE>
<P></P>
<P>The following are examples of JavaScript that you can use to make your email address appear different in the code but still perform the same function as if it were regularly coded in HTML (ie: mailto:support@example.com).&nbsp; To use these examples, just copy and paste the code into your HTML document and replace the required field(s) with your email address.</P>
<P><STRONG>1. Basic Email Script</STRONG></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">&lt;script language=JavaScript&gt;<BR>&lt;!&#8211;<BR>document.write(&#8220;support&#8221; + &#8220;@&#8221; + &#8220;example.com&#8221;);<BR>//&#8211;&gt;<BR>&lt;/script&gt;</FONT></P></BLOCKQUOTE>
<P>Result:&nbsp; support@example.com</P>
<P><STRONG>2. Basic Mailto: Email Script with Link Text</STRONG></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">&lt;script language=JavaScript&gt;<BR>&lt;!&#8211;<BR>var username = &#8220;support&#8221;;<BR>var hostname = &#8220;example.com&#8221;;<BR>var linktext = username + &#8220;@&#8221; + hostname;<BR>document.write(&#8220;&lt;a href=&#8221; + &#8220;mail&#8221; + &#8220;to:&#8221; + username +<BR>&#8220;@&#8221; + hostname + &#8220;&gt;&#8221; + linktext + &#8220;&lt;/a&gt;&#8221;);<BR>//&#8211;&gt;<BR>&lt;/script&gt;</FONT></P></BLOCKQUOTE>
<P>Result: support@example.com</P>
<P><STRONG>3. Inline JavaScript</STRONG></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">&lt;a href=&#8221;#&#8221; onclick=&#8221;JavaScript:window.location=&#8217;mailto:&#8217;+'support&#8217;+'@&#8217;+'example&#8217;+&#8217;.com&#8217;&#8221;&gt;Link Text&lt;/a&gt;</FONT></P></BLOCKQUOTE>
<P>Result: Link Text</P>
<P>The three scripts options above should give you some flexibility in how you choose to use these on your website.&nbsp; Remember to insert your own email address into the fields where the support@example.com email address is located.</P>
<P><STRONG>Problems Associated with JavaScript</STRONG></P>
<P>There doesn&#8217;t appear to be many problems with using the above scripts in the HTML code of your documents.&nbsp; The biggest issue may be incorrectly coding the scripts or issues with older browsers that do not support JavaScript. One last issue that may see its day in history is email harvester programmers being able to find email addresses among the JavaScript code.&nbsp; While this may be a reality sooner than we expect, for the most part JavaScript should be SPAM-proof enough to block most malicious SPAM bots.</P>
<P></P>
<P>In this section, the use of mod_rewrite is very successful in blocking the SPAMbots and other spybots that visit the website with a mission to either steal your email address or grab information from your website without your permission. Consider this method as a step above using JavaScript, because it stops them before they ever read the webpage itself.&nbsp; So if you are thinking of using JavaScript on the page to block bots from finding your email, consider the use of mod_rewrite as a primary defense weapon against SPAM and other malicious robots.</P>
<P>One note to readers: The use of mod_rewrite requires that you have it installed on your server, and you have the ability to edit the <FONT face="courier new, courier, mono">.htaccess</FONT> file.&nbsp; Below is a simple way to locate the <FONT face="courier new, courier, mono">.htaccess</FONT> file while using a program such as CuteFTP (or a similar FTP client that performs the same functions).&nbsp; If you are unsure whether you have mod_rewrite installed, you should first consult the server administrator with your primary hosting company.&nbsp; Ask them if you have mod_rewrite and permissions to edit the <FONT face="courier new, courier, mono">.htaccess</FONT> file.</P>
<P><STRONG>How to Find .htaccess in a Common FTP Client</STRONG></P>
<P>To locate the <FONT face="courier new, courier, mono">.htaccess</FONT> file, most often you need to display all hidden files present when connecting to your hosting account.</P>
<P>To enable your FTP client to display all hidden files <FONT face="courier new, courier, mono">(.htaccess</FONT> and many other files not normally seen by the user). </P>
<OL>
<LI>First locate your saved site properties.<BR><BR>
<LI>Right click on the profile of the website you want to display hidden files. This is most often located in the &#8220;FTP Sites&#8221; section of most clients.<BR><BR>
<LI>Once you right click on the FTP site, select &#8220;SITE PROPERTIES&#8221; from the menu.<BR><BR>
<LI>An option box will load up displaying the site properties of your site. Look for a tab called &#8220;ACTIONS&#8221; and click on it.<BR><BR>
<LI>It will display the actions of the site. Locate a gray box called &#8220;FILTERS&#8221; and click on it.<BR><BR>
<LI>This will display the &#8220;Filters&#8221; properties of the site.<BR><BR>
<LI>Locate the &#8220;Enable Filtering&#8221; from the options available. Make sure this box is checked.<BR><BR>
<LI>Once you have checked the enable filtering box, a small box at the bottom of the options will be displayed. <BR><BR>
<LI>It should say something similar to &#8220;Enable Server Side Filtering&#8221;. Make sure this box is checked as well.<BR><BR>
<LI>Now enter the following into the &#8220;Remote Filter&#8221; box: -a<BR></LI></OL>
<P>Once you have entered in the filtering options, make sure to click &#8220;Ok&#8221; or &#8220;Apply&#8221; in order to save your changes.&nbsp; You should now be able to see all hidden files on the server.&nbsp; Make sure you start a new connection to view all files.&nbsp; If you are still having trouble viewing all your files and can&#8217;t seem to locate the <FONT face="courier new, courier, mono">.htaccess</FONT> file, don&#8217;t give up, but consult the system administrator of your hosting account to assist.</P>
<P><STRONG>How to Setup Your .htaccess File</STRONG></P>
<P>Once you have confirmed that you do have a <FONT face="courier new, courier, mono">.htaccess</FONT> file, and mod_rewrite is turned on, add the following lines to your <FONT face="courier new, courier, mono">.htaccess</FONT> file:</P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">Options +FollowSymlinks<BR>RewriteEngine On<BR>RewriteBase /</FONT></P></BLOCKQUOTE>
<P></P>
<P>The robots that you will want to block will depend on your preferences, as well as any bots that frequent your website on a regular basis.&nbsp; Cutting down on bandwidth costs, preventing robots from collecting your email address, and preventing robots from collecting information from you or your website are all good reasons to block a potential robot. </P>
<P>The best method of deciding which robots to block is to do some quick research about the robots that like to take residence on your site.&nbsp; If you cannot find reliable information about a robot or its use of something you would not approve of, simply block the robot by using a robots.txt file.&nbsp; If you find that a robot does not obey the robots.txt file, pull out the big guns and use mod_rewrite to stop them dead in their tracks. </P>
<P><STRONG>Example Robots</STRONG></P>
<P>There are several common bots that one might run into frequently such as &#8220;<STRONG>Microsoft URL Control</STRONG>&#8221; which is a robot that ignores the robots.txt file and fetches as many pages as it can before leaving the site.&nbsp; This SPAMbot is used by many different people all using the same name.&nbsp;</P>
<P>&nbsp;The second robot that frequents websites is the <STRONG>NameProtect (NPbot)</STRONG> robot. This robot&#8217;s job is to collect information about websites that are potentially violating brand names of clients.&nbsp; This robot does not obey the robots.txt file, responds to emails sent to the NameProtect company, and serves no good purpose as far as we have determined. </P>
<P><STRONG>To Block the Microsoft URL Control Robot by User Agent:</STRONG></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">RewriteCond %{HTTP_USER_AGENT} &#8220;Microsoft URL Control&#8221;<BR>RewriteRule .* &#8211; [F,L]</FONT></P></BLOCKQUOTE>
<P><STRONG>To Block the Nameprotect Robot by User Agent:</STRONG></P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">RewriteCond %{HTTP_USER_AGENT} &#8220;NPbot&#8221;<BR>RewriteRule .* &#8211; [F,L]</FONT></P></BLOCKQUOTE>
<P>Furthermore, once you establish a good number of bots that you would like to block using mod_rewrite, you can compile a list and add comments as well, like so:</P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT face="courier new, courier, mono">RewriteCond %{HTTP_USER_AGENT} &#8220;Microsoft URL Control&#8221; [OR] #bad bot<BR>RewriteCond %{HTTP_USER_AGENT} &#8220;NPbot&#8221; <BR>RewriteRule .* &#8211; [F,L]</FONT></P></BLOCKQUOTE>
<P>One thing to note about using the examples here, make sure that you correctly know how to insert the script into mod_rewrite and that you do so in the proper rules required for this technique to be effective.&nbsp; Additionally, one last thing to note is that mod_rewrite rules are not an ultimate solution to SPAM and malicious bot problems. You can, however, effectively block a good majority of bots out there and dramatically cut down on the amount of SPAM you receive. If you use the JavaScript methods and mod_rewrite then, not only will your website be one heavily guarded anti-SPAM site, but you may actually enjoy downloading your all email messages to find them SPAM free.<BR></P><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/">Protect Against Invaders by SPAM-Proofing Your Website</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/protect-against-invaders-by-spam-proofing-your-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ROBOTS.TXT Primer</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/#comments</comments>
		<pubDate>Mon, 29 Sep 2003 00:00:00 +0000</pubDate>
		<dc:creator>seo_admin</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/</guid>
		<description><![CDATA[<p>There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea to dispel some myths and highlight what robots.txt files are all about.There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/">ROBOTS.TXT Primer</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea to dispel some myths and highlight what robots.txt files are all about.<br /><span id="more-580"></span><br />There is often confusion as to the role and usage of the robots.txt file. I thought it would be a good idea to dispel some myths and highlight what robots.txt files are all about. Firstly, a robots.txt file is NOT to let search engine robots and other crawlers know which pages they are allowed to spider (enter), it is primarily to tell them what pages (and directories) they can NOT spider.<br /><br />The majority of websites do not have a robots.txt, and do not suffer from not having one. The robots.txt file does not influence ranking in any way. Its goal is to disallow certain spiders from visiting and taking back with them pages you do not wish for it to do so. <br /> User-agent: EmailCollector<br />Disallow: /<br /><br />If you were to copy and paste the above into notepad, save the file as robots.txt and then upload it to the root directory of your server (where you will find your home page)what you have done, is told a nasty email collector to keep out of your website. Which is good news as it may mean less spam!<br /><br />I do not have the space here for a fully fledged robots.txt tutorial, however there is a good one at <br />http://www.robotstxt.org/wc/exclusion-admin.html <br /><br />Or simply use the robotsbeispiel.txt I have uploaded for you. Simply copy and paste it into notepad, save it as robots.txt and upload it to your server root directory. <br />http://www.abakus-internet-marketing.de/robotsbeispiel.txt <br /><h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/">ROBOTS.TXT Primer</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/robots-txt-primer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing Websites For Humans  In A World Of Robots!!</title>
		<link>http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/</link>
		<comments>http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/#comments</comments>
		<pubDate>Fri, 11 Jul 2003 00:00:00 +0000</pubDate>
		<dc:creator>seo_admin</dc:creator>
				<category><![CDATA[Search Engine Spiders Help]]></category>

		<guid isPermaLink="false">http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/</guid>
		<description><![CDATA[<p>In this day and age, it can be easy to forget the basics of why your website is online. Crawlers/Robots, they come, they go, but they never pay. Thats where your visitors come in.With the ever increasing number of web pages &#038; documents available on the internet, it has become difficult to find information fast [...]<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --></p><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/">Designing Websites For Humans  In A World Of Robots!!</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></description>
			<content:encoded><![CDATA[In this day and age, it can be easy to forget the basics of why your website is online. Crawlers/Robots, they come, they go, but they never pay. Thats where your visitors come in.<br /><span id="more-579"></span><br />With the ever increasing number of web pages &#038; documents available on the internet, it has become difficult to find information fast and without having hundreds of advertisements thrown into our faces (most of which will have no relevancy to the information we are seeking). There is quite simply no realistic method to finding material on the internet other than using search engines. At this point, everyone should realize that search engines use &#8220;robots&#8221; in order to &#8220;crawl&#8221; through the internet and collect web pages &#038; other documents. The search engine will then use these documents to make up the engines &#8220;index&#8221; or &#8220;database&#8221;. This in itself is not a problem, but to every action there is an equal &#038; opposite reaction (Newton&#8217;s 3rd law of motion!).<br /><br />With this expansion of information on the web, which has driven more people to use search engines on a daily basis, it has become a requirement for the search engines to become more active in order to keep their database up to date. This means crawling more web pages at a greater frequency. Website owners have indeed noticed this increase in activity, and they have not stared at it blankly in the face, they have reacted. They now realize that these search engines are producing significant percentages of their traffic (up to 90% in some cases). So what to do&#8217;<br /><br />Again, with the expansion of the web, there has also come more competition in essentially every industry, from computers, travel, food, right down to buying pets online. This competition is healthy in that it has pushed prices lower, but this very same competition has indirectly lowered the overall satisfaction level of website visitors. Let me explain more.. Website Designing Guidelines :<br />Designing websites for humans is a far wider topic that can be covered in the scope of this brief article and will vary by website. That said, you might find these general guideline helpful:<br /><br />1. Provide headers on each page so that your visitors can see clearly what the page they have loaded relates to. This header should be the largest text on the page. <br />2. Content text should be no more than 1/3 of the size of the header, this will ensure that the page is not too &#8220;monotone&#8221;. <br />3. Navigation menus should be very clear and easy to use. This should be presented on every page, and the user should not have to rely on another source (such as a framed page) for navigation. <br />4 .Pages should be as light on the images as possible, as many people out these still use 56k, and loading images takes time on a 56k. <br />5. Pages should be no more than 40k in size (html coding size) unless they are papers e.g. technical studies / technical papers / specification sheets.<h3>Author information</h3><div class="ts-fab-wrapper" style="overflow:hidden"><div class="ts-fab-photo" style="float:left;width:64px"><img alt='seo_admin' src='http://forums.seochat.com/vbsso/vbsso.php?do=avatar&id=d55928ffc41dcd6c921e3265b9fc4cf4' class='avatar avatar-64 photo' height='64' width='64' /></div><!-- /.ts-fab-photo --><div class="ts-fab-text" style="margin-left:74px"><div class="ts-fab-header"><div style="font-size: 1.25em;margin-bottom:0"><strong>seo_admin</strong></div></div><!-- /.ts-fab-header --><div class="ts-fab-content" style="margin-bottom:0.5em"></div><div class="ts-fab-footer"></div><!-- /.ts-fab-footer --></div><!-- /.ts-fab-text --></div><!-- /.ts-fab-wrapper --><p>The post <a href="http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/">Designing Websites For Humans  In A World Of Robots!!</a> appeared first on <a href="http://www.seochat.com">SEO Chat</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://www.seochat.com/c/a/search-engine-spiders-help/designing-websites-for-humans-in-a-world-of-robots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Database Caching 1/13 queries in 0.032 seconds using memcached
Object Caching 1999/2107 objects using memcached

Served from: www.seochat.com @ 2013-06-19 10:05:26 -->