SEO: An Overview

Developer Shed CEO Jonathan Caputo recently gave a presentation at iDate, a social networking conference in Miami. He covered introductory search engine optimization for an audience composed of people who own and work for online dating and social networking sites. If you know someone who needs a good introduction to basic SEO tactics, you’ll find that this article, based on that presentation, is a good start.

In many respects search engine optimization is considered to be a black art. So you should keep in mind that, while we at Developer Shed have implemented these concepts and achieved top rankings in Google, we can’t guarantee that what worked for us will be as effective for you. Also remember that many of the concepts we discuss will focus on optimizing for Google, because that search engine owns such a large share of the search market. I’d like to make one final point before we dive in: effective SEO is truly an organic concept; it must be pursued on a continual basis because search engines quite frequently update their algorithms.

With those points out of the way, here are the topics we will be covering:

• Server Response Codes (301s, 404s, etc.).
• Content Topics
• URLs
• Sitemaps
• Link Optimization
• Redundancy

Let’s start with server response codes. A server response code is an identifier that your web server sends out in response to each request it receives. Most of the time requests are processed just fine, and web surfers won’t see the codes, but there are two particular codes you may need to set up at certain times for specific SEO-related situations. These are the 301 and 404 response codes.

A 301 response code tells the requester that the web page he’s looking for has permanently moved to a new location or URL. It’s helpful for you to set this up because it lets search engines know that content no longer exists at the old location, but has now moved to a new home. You should use a 301 redirect any time you need to reorganize you content or change its URL in any way. This will also prevent the search engines from penalizing your for what they might perceive as duplicate content (we’ll be returning to the theme of duplicate content many times).

A 404 response code tells the requester that the content for which he is looking can’t be found. This is the code you use when you delete content from your web page and want to delete it from the search engines as well. Many web sites will take 404s and redirect them to a search page; it’s not a good idea because it could fool the search engines into thinking that the page is still valid. It also may incur penalties.

{mospagebreak title=All About Content}

I’d like to cover five topics here:

• Duplicate content.
• Content relevancy.
• Content aging.
• Content theft.
• Content length.

Duplicate content is one of the leading causes of search engine penalties. There are two kinds of duplicate content: internal and external. Internal content is easier to do something about. If one or more of your web pages is identical to others in your site, you need to rewrite the page or rethink whether your site even needs it. External duplicate content I’ll cover in more detail when I discuss content theft. It’s incumbent upon you to make sure your site is cleanly organized and your content is original and not duplicated in any way. It doesn’t hurt to spend some time searching the web regularly to make sure your content remains your own and no one is stealing it.

Content relevancy is becoming more and more important with search engines. For example, if you’re an online dating site, search engines do not expect you to have content about mortgages. Search engines often employ humans to spot check their results. If your content is not relevant to your topic, it will count against you not only in the search engines, but with your visitors, who came to your site expecting one thing and wind up finding something else.

How old is your content? If you want to gain and keep a good position on the search engine results pages, you need to have the search engines index your site frequently. The only way you’re going to do that is by frequently updating your site with fresh, new content. Search engines keep track of how often a site updates its content, and if you do not add to your content frequently, the search engines will send their spiders to crawl your site less often.

Content theft is a big problem, and can lead to duplicate content issues. What can happen is that a thief will send a “scraper bot” to steal someone else’s content and then build a site made up entirely of content that has been scraped from elsewhere. The thief will then put up lots of AdSense ads and make money from the site that way. Google has no idea who had the content first, so it may well penalize both of you for duplicate content. When we find that our content has been stolen, we go through a number of steps to get it removed. We start by emailing the perpetrators directly, and then check the site’s WHOIS file and contact their ISP. If they have forums we will also post a cease and desist there. Remember, your intellectual property is yours, and it needs to be protected.

Finally, consider the length of your content. Search engines will process only so much information for a page. If you want to keep your content indexed, split it up into smaller pieces across several HTML pages so that both the search engines and your visitors have an easier time digesting it.

{mospagebreak title=URLs and Sitemaps}

You might not expect it, but the URLs of your content can actually make a big difference as to how the search engines index your site. Search engine friendly URLs use keywords from your content inside the URL itself. That gives you a better chance of inching up higher on the search engine results pages.

Many sites, particularly ones whose content is fed from a database, operate with dynamic URLs. This is a URL that specifies a page and passes parameters to identify which content should be shown. The problem with this kind of URL is that it doesn’t index very well. Dynamic URLs look something like this:

As you can see, the dynamic URL calls a generic file, and passes parameters to load the content. You’re much better off with a URL such as the following, which includes valuable keywords that appear in the content AND the URL:

When you’re ready to change over to search engine friendly URLs, you need to make sure the old dynamic ones are no longer visible or you risk receiving a duplicate content penalty. Either use a 404 to kill the old links, or – preferably – a 301 redirect to help the search engines find the new URLs.

There is one more thing you need to be careful about with your URLs: whether or not they end in a slash. Believe it or not, that makes a world of difference to a search engine. If you choose to use slashes at the end of your URLs, do the same thing with all your pages. If a search engine sees two identical URLs, one with a slash and one without, it will index both of them and assume they are duplicate content. This is especially true with home page URLs.

Now I’d like to move on to sitemaps. Sitemaps are a valuable tool which allows you to more thoroughly index your web site. There are two kinds of sitemaps: search engine submitted sitemaps and hosted sitemaps. The first kind is a special file you put together and submit to a search engine in the hope that the engine will process it and index all of the pages in the map. Google, Yahoo, and MSN are known to have specific formats for sitemaps; in fact, these big three have joined forces in an open sitemap initiative, about which you can find more information at

A hosted sitemap is an HTML page on your web site that links to all the content that you would like to have indexed by the search engines. At one time it was common practice to link to an archive, or more “search engine friendly” copy of the content, so the search engine would have an easier time locating the relevant parts of your page. If you use this type of sitemap, it must contain ONLY the links to the actual content and not two versions of the same content – in other words, you do not want to have one page for your visitors and another page for the search engines. Since many popular forum packages come equipped with this archiving functionality, you need to be aware of it; otherwise, you could face possible duplicate content penalties.

{mospagebreak title=Link Optimization and Redundancy}

There are several points to consider when optimizing your links; I’m going to focus mainly on optimizing your outbound links. First, you want to limit the number of outbound links that exist on a single page on your site. If you have too many you start looking like a link farm, which Google penalizes; also, too many outbound links on one page can be confusing to your visitors.

When you do link to outbound sites, make sure the links are relevant. For example, if you’re operating an online dating site, why would you link to a site selling ephedra? Putting links on your site that have no relevance to your particular focus can move your site down in the search engine results and, again, confuse your visitors.

On the subject of link exchanges, link farms, and link swapping, you want to avoid them (especially link farms) at all cost. Search engines spot them easily. They can result in your site being completely blacklisted by Google.

I know that selling links is a popular practice, and that many large sites and publicly traded companies do this to make extra money. Here is an example of what I’m talking about:


You should avoid doing this on your site. While links to casinos, games, gift sites, etc. may be lucrative, they will only hurt you in the long run when Google spots all of these links that are unrelated to your content. Your competitors may even report your site for link spamming, which can of course result in your site being completely banned from a particular search engine. This practice also cheapens the reputation of your site with your visitors.

To finish up this topic, I’d like to discuss nofollow links. Using nofollow on outbound links tells Google to ignore the link and not follow it for indexing purposes. If you nofollow most of your external links, your Google PageRank should remain intact. But don’t use nofollow on internal links; you want PageRank to spread properly among your own pages.

One of the ways Google decides whether a web site is relevant to a particular query is by looking at the frequency with which the query’s key words are repeated on that site. This kind of redundancy is also known as keyword density. If you look at just about any page from SEO Chat, you will see the same keywords repeated in many places on that page: the page title, the breadcrumb navigation, an H1 title on the page, the URL itself, and the meta tags. Putting your keywords in these places sends a clear and concise message to the search engine highlighting what a particular page is all about. This in turn should enhance your ranking.

There is a lot more to SEO than I can cover within one half-hour presentation. Many people have spent years studying SEO and make it a full time profession. This is just a general foundation to build upon. I recommend joining an SEO community such as SEO Chat, reading the articles there, and posting question on our forums – it’s free and our members are very knowledgeable and helpful.

[gp-comments width="770" linklove="off" ]