Search Optimization and 404 Errors

Webmasters often overlook the search engine optimization potential of fixing 404 errors in a website. In fact, paying more attention to 404 errors can help your site attain better rankings in search engines as well as contribute to a better user experience.

This tutorial is all about managing 404 errors for search engine optimization.

Tip #1: Make a well-designed 404 page

A 404 page is where any user will land when the URL requested is not found on your server. You might encounter this error most often as “404 Not Found.” The first step is to create a user friendly 404.

How do you create a user-friendly 404 page? You should keep four guidelines in mind. First, the page should explain why the user has landed on that page instead of the page they intended to reach. Second, it should include the website’s search box and tell the user to type in any relevant keywords in the hope of returning related posts. Third, it should be as simple as possible; to that end, the 404 page should NOT include any server-related errors. And finally, a user-friendly 404 page should include other important website navigational components. This means the sidebar, footer and header should be included as well, so that the user can easily navigate to other pages on your website.

Below is an example of a user-friendly 404:

If you’re a beginner, you may be wondering how you can see your own 404 error page. It’s pretty simple. Launch a browser; in its address bar, type your domain, and then randomly type any letters after it. The resulting page should be a 404, since it does not exist on your server. For example if your domain is http://www.example.com/ , you can see your 404 error if you type this in the browser address bar:

http://www.example.com/fdfdfd

Tip #2: Make sure your 404 error pages really provide a 404 header status

Not all 404 error pages provide a 404 header status. This status simply translates, to search engines bots and other bots, as “page not found.” For example, Google uses this server header status when removing outdated and removed pages in your website. Suppose you are deleting 30 URLs; these 30 URLs now return a 404 header status. When Googlebot re-crawls these pages and notices that they are now returning 404 not found, it will only be a matter of time before these pages are also removed from the Google index without requiring you to submit URL removals in Google Webmaster tools (Source: http://www.mattcutts.com/blog/overdoing-url-removals/).

Okay, so how can you be sure that your 404 error page will really return a 404 header status? Follow the steps below:

1. Go to this server header status checker: http://www.seoconsultants.com/tools/headers

2. Enter the URL of your 404 page (or any URL that should return a 404).

3. Click the “Check Server headers” button.

4. )You should see a “404″ as the header status. For example, see the screen shot below:

Looking at the above screen shot, you can be sure that the URL: http://www.php-developer.org/dfdsfdf does not exist and returns the 404 Not Found header status.

What should you do if your 404 Not Found page does not return a 404 header status?

You can use .htaccess and PHP methods:

1. Using .htaccess, you can define your 404 error document:

ErrorDocument 404 /error-404.php

Note: error-404.php is your 404 custom page.

2. Using PHP to dynamically assign pages with 404 not found header status:

<?php
 header( "HTTP/1.1 404 Not Found" );
exit;
?>

If both of the above methods fail, it’s time to contact your web developer to fix the 404 errors. Or you may need to contact your web hosting agency. This is often due to server misconfiguration.

What happen if your 404 error pages are not returning the 404 not found header status? This will create a problem with search engines. For example if you are deleting thousands of URLs which are expecting to return a clean 404. But search engines will only trust 404 if the server header status should also return a 404. In this case, if its not 404 that is being returned, like 200 OK status. Then those pages cannot be removed from the Google search engine index.

Tip #3: Know ALL of the requests to your server that return 404

For this, you need to have your site registered with Google Webmaster Tools. After successful verification, you can follow the steps below:

1. Log in to your Google Webmaster Tools account: www.google.com/webmasters/tools/

2. Under “Sites,” click the domain for which you would like to see the 404 errors.

3. Under “Crawl errors,” you should see the 404 errors found by Googlebot on your site under “Not found.” Click this link. For example, this is a sample 404 error:

These errors were only discovered by Googlebot. For a much more detailed list of 404 errors, you should examine your server log. If you are using Cpanel, you can follow the steps below:

  1. Log in to your cpanel hosting account.

  2. Under “Logs,” click “Raw Access Log.”

  3. Click the domain for which you would like to know all the 404 errors.

  4. These errors will then be downloaded as a zip file to your desktop.

  5. Unzip it; the log is a text file. Open the log.

  6. To know all the 404 errors, simply do a “Control-F,” and then type 404. For example, below is an entry from a raw access log:

118.96.133.139 – - [25/May/2011:17:52:27 -0500] "PUT /indonesia.htm HTTP/1.1" 404 5115 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; InfoPath.2)"

Why is it important to know all requests that return 404 errors?

You need to know all of the request that return 404 errors for two reasons. First, hackers sometimes guess URLs. If they do not guess properly (for example, your admin URL), they will get a 404. You can already see this in your raw access log; if you see a bunch of requests that return a 404 targeting or guessing your admin URL, it’s a hacking attack. For example, below is a combination of possible admin URLs that hackers often test:

/admin.php
/login.php
/administrator
/login
/admin

The hacker’s IP address will also be provided, which you can block in .htaccess to stop the attack using this line:

Deny from xxx.xxx.xxx.xxx where xxx.. is the IP address.

Second, there are some requests initiated when a user clicks a link from another website leading to your site. But if this link is malformed or uses a wrong URL, then it will return a 404 error. This will be discussed thoroughly in the next section.

Tip #4: Transform a 404 into a link building opportunity

Some webmasters linking to your site use the correct URL, so any visitors (including search engine bots) will land properly on a targeted page in your site which will return a 200 OK status (Found). But there are some webmasters that will link to your site using the wrong spelling in the URL.

As a result, users will not properly land on the designated page, and will land instead on the 404 page on your site. So how can you turn these 404 errors into a link building opportunity? Follow the steps below:

1. Log in to Google Webmaster Tools.

2. Go to Crawl errors and examine all “Not found” pages reported by Google.

3. Some of these “Not found’ pages are links from other domains. To know the origin, click the pages under “Linked From.”

4. Visit the page of origin that linked to your site, as reported by “Linked from.” Examine the URL. If it is misspelled, do the following:

301 redirect these 404 URLs to the correct URL on your website. You can use .htaccess to do this, just use this syntax:

redirect 301 "/thisisthemispelledurl.php" http://www.example.com/thisisthecorrecturl.php

Note: This is not limited to links from other domains; sometimes, you may also commit some spelling errors in your URL in your internal links. You can correct these errors as well. Double check using a server header status and enter those 404 URLs in the tool provided earlier. They should now be 301 redirected to the correct URL. Thus, the link juice will transfer correctly to the intended URL, helping its rankings.

Google+ Comments

Google+ Comments