Search Engine Optimization, Website Development and Search Engine Spiders

Do you have some experience with HTML and web site development, but haven’t really gotten your feet wet with search engine optimization? Keep reading, because you’re in luck. In this article we’ll cover some of the more important aspects of SEO, focusing on the changes that you should find easiest to make to your web site. After you read this, you should know where to concentrate your efforts now and going forward.

HTML Tags and SEO

The <Title> tag is the most important part of the page. It is the tag that visitors see in search results, and it should contain your targeted keywords. The title tag can make your page rank or make it disappear from search results. Use this format for web page title tags:


Keywords | Website Name or Brand


It’s okay to include only  your keywords without the site name, especially if you aren’t branding using other Internet marketing methods.


Your <h1> tag is very important. As with the web page title, you should include your targeted keywords in this tag.

Your <h2> tag is next in importance. You should put keyword variations in this tag It’s okay to be more natural here.

Your <h3> is less important, but still very useful. Promote keyword variations or subheadings using the H3 tag. You can relax a bit with this one and mention keywords only once in a while.

Now that we’ve talked about the title tags, let’s briefly touch on links so you can see one aspect of why getting the keywords in your title tag is important. An incoming link to your pages is typically formed like so:

Link: <a href =”http://www.site.com/page.htm” title=”keyword”>anchor text keywords</a>

It’s very important to include targeted keywords in anchor text in inbound links. You should also use targeted keywords in internal links, including navigation. The search engines take strong notice of keywords in these positions.

This is true for links that are "followed." But what about links that are designated "nofollow"? Many blogs and other sites make any links in comments automatically nofollow.

Link No followed: <a rel=”nofollow” href =”http://www.site.com/page.htm” title=”keyword”>anchor text keywords</a>

Here’s how different search engine treat no follow.

  • Google’s spider doesn’t index the content.

  • Yahoo may follow the link to discover content, but gives no credit to the link itself (no pagerank flow, like Google).

  • MSN may or may not follow; it’s unknown at this time.

  • Ask.com does not officially support nofollow. It follows “nofollow” links and passes pagerank.

Read more about how search engines treat nofollow.


Finally, let’s take a look at the image tag, and how use keywords with it. Here’s an example:

Image tag: <img src=”your-keyword.jpg” alt=”your keyword” />

Include your keywords in the image name and image ALT tag text. The ALT tag is treated like anchor text in image links.

Here are a few points to keep in mind as you put your web pages together.

  • Keep page size at under 200KB before images, CSS and JavaScript.

  • Keep the title tag under 70 characters. It will not fit in search results if it is longer than this. 


  • It’s OKAY to ignore the meta keywords tag.

  • Use the meta description tag. Content from that tag shows in search results under your main link, hence it can be very helpful in increasing your click-through rate. Keep your meta description under 150 characters and put enticing calls to action that separate you from other competitors on the page. This tag has been proven to increase click through rate from search engines.

  • Don’t put your content too deep, within many folders. Though Google doesn’t have any trouble discovering it, other, smaller search engines may stumble: www.site.com/folder/folder/page.htm is better than www.site.com/folder/folder/folder/folder/folder/page.htm. Folder use is great for organizing content, but if you can cram 50 pages into one folder without having maintenance trouble, do it.

Now that we’ve looked at some general rules, let’s examine something that can raise canonical issues.

Different Home Pages

http://www.site.com/

http://site.com/

http://www.site.com/index.html/

http://site.com/index.html/


The above examples lead to the same page, but may be treated as different addresses by search engines. Google resolved this issue, but other search engines may still have trouble. When you do internal linking, pick the format you prefer and stick to it. For example, if you’re linking to the home page, make sure all pages link to http://www.site.com/ instead of random variations like http://site.com/ and http://www.site.com/index.html/.

Use a 301 redirect to point all addresses to the same variation. For example, if you selected http://www.site.com/, then 301 redirect http://site.com/, http://www.site.com/index.html/ and http://site.com/index.html/ to it. This way link power passes without losses and you fix some older URLs that might have been overlooked.


The 301 redirect is a friendly way to point search engines and visitors to the pages you want them to see. The 301 redirect passes link power from the old address to the new one.

Only for Apache Server:

To implement a 301 redirect you need to put an .htaccess file in the root directory where all the pages are stored. If you don’t have an .htaccess file yet, you can create one using notepad.

Put the following in the .htaccess file:


redirect 301 /old/old.htm http://www.newsite.com/new.html

Redirects a single page or a site directory to a new location.

redirect 301 / http://www.site.com/

Redirects an old site to a new domain, passing link power from the old domain.

RewriteEngine on

RewriteCond %{HTTP_HOST} ^site.com [NC]

RewriteRule (.*) http://www.site.com/$1 [L,R=301]

Redirects site.com to www.site.com


If you’re redirecting an old domain to a new one, make sure to keep renewing the old domain. If the domain expires, all links pointing to the old domain will lose their value for your new domain. Renewals are very cheap, but it may cost several grand and a load of time to get back previous link juice.

Search Engine Spiders

You can see the spiders that have visited your pages by looking at the log file. This list will help you identify who’s who in the spider world.

Google Spiders


Google Search

googlebot


Microsoft Spiders


MSN Search

msnbot/x.xx
MSNBOT/0.xx

MSN Media Search

msnbot-media/1.0

Windows Live Product Search

msnbot-Products/1.0

Microsoft Mobile Search

MSNBOT_Mobile MSMOBOT Mozilla/2.0


Yahoo Spiders


Yahoo Search

SLURP

Yahoo Blog Search

Yahoo-Blogs/v3.9

Yahoo Multimedia Search

Yahoo-MMAudVid/1.0

Yahoo Product Search

YahooSeeker/1.0
YahooSeeker/1.1


Ask.com Spiders


Ask Search

Teoma MP

Alexa


Alexa / The Internet Archive

ia_archiver

ia_archiver-web.archive.org

ia_archiver/1.6


Look for those robot names in your log files. You can learn more robot names on this website: http://www.user-agents.org/

Robot.txt is the file that instructs different search engine robots how to crawl your website. You can block search engine spiders from indexing your entire website or exclude specific folders and pages.

Even if you want search engines to spider you entire site, including robot.txt will keep your error log clear of unsuccessful robot.txt requests.

Example:


User-agent: *
Disallow:

Everything is allowed for all robots

User-agent: [botname]

Disallow: /

Everything is disallowed to a specific bot

User-agent: *

Disallow: /

Everything is disallowed to all bots

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /private

Following directories are not allowed to all bots

User-agent: googlebot

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /private

Following directories are not allowed to Googlebot

User-agent: *

Disallow: /Folder/some-content.html

Page not allowed for all search engines

User-agent: SLURP

Disallow: /Folder/some-content.html

Page not allowed to Yahoo bot only.


SEO Chat has an automatic robot.txt generator. Specify bots you would like to exclude or paste in a specific URL that should be kept private. Use Robot.txt generator. Learn more about Robot.txt rules.

Robot Specific Meta Commands

Apart from robot.txt, you can specify commands to search engine spiders on each separate page.

  • Noindex – page not indexed by Google, Yahoo, MSN and Ask.

  • Nofollow – all links on the page are nofollowed by Google, Yahoo, MSN and Ask.

  • Noarchive – pages is not cached by Google, Yahoo, MSN and Ask.

  • Noodp – stop search engines from pulling the description from DMOZ on Google, Yahoo and MSN.

  • Noydir – stops Yahoo from pulling a page description from Yahoo Directory.

  • Nosnippet – stops Google from generating a page description based on page snippets.


Enter the above commands to the code of each separate web page in this format:


<meta name=”ROBOT NAME” content=”Noindex” />


If you want all robots to abide at once, enter


<meta name="robots" content="nofollow"/>


Search Engine Pitfalls

Spiders do not like:

  • Session IDs

  • Frames

  • Logins

  • Forms


Google can fill out forms, make selections and click on some buttons in order to see results past the form.

.when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML… – Google Webmaster Central

Google has a free feature in its Webmaster Tools called “SiteMaps.”

XML sitemaps notify Google whenever you add a new page to your website, and Google in turn schedules Googlebot to crawl new page.

Sitemaps are especially useful if:

  • Your site has dynamic pages.

  • Your site pages are tricky for Google to crawl (AJAX, Flash).

  • You have a load of poorly inwardly linked content that you want G-bot to eat.

Sitemaps document a lot of cool data, including:

  • How often you change your pages (daily, hourly, yearly, etc)

  • The date each page was modified.

Google doesn’t guarantee inclusion of new pages in the index. This is usually due to low trust from Google, which is normal for new websites. Sometimes pages will only be included in the supplemental index, until a site gains more authority links.

The supplemental index is sort of a junkyard of all pages that are okay, but don’t have enough trust to be featured in main search results. If there aren’t enough results in the main index, pages from the supplemental index come to help. As for you, the webmaster, being in the supplemental index sucks, but there’s little you can do if the site is new.

To create Sitemaps, go to the "SiteMap" section of Google Webmaster Tools. Follow the instructions.

There’s also software available on the web that does the exact same thing as Google Sitemaps. Some even charge you money for it, which is weird; who pays when they can get it for free?

Conclusion

This stuff is very straightforward. You will learn a lot more once you actually get down to doing SEO. The sooner you start doing it, the better. In fact, I suggest you stop reading and actually do some optimization. You may be clueless at first, but that’s what the SEO chat forums are for. Ask and get answers!

Check “The Web Developer’s SEO Cheat Sheet“ for more tips and a review of what I discussed above. 

Good luck.

Google+ Comments

Google+ Comments