Preparing Your New Site for Structural Changes

When re-designing a website, there are a great many considerations that need to be made. On the whole, good webmasters catch the most critical elements, but one area that seems to get consistently overlooked is inbound links. Often times when the site is restructured, the fate of all of these links can get overlooked. Inbound links must be treated with the utmost respect since they direct visitors to your website and affect your PageRank as well. Links from search engines to your site need special care. If your server is returning a 404 for a link, the SE will eventually drop the link from its list and the replacement content may not get spidered for a long time.

The Problem

When a search engine spiders your site and receives a 404 status code for a link, the search engine makes a note that the link was not found, and after a few such experiences removes the link from their list of links to crawl on your site. By properly redirecting a search engine to content that has been relocated, you retain these valuable links that have been built over time.

  1. Visitor Clicks Old Search Engine Listing  http://www.hafenbrack.com/OldAddress.htm

  2. Visitor Lands on http://www.Hafenbrack.com

  3. Web server attempts to serve page: /OldAddress.htm

  4. “Page Not Found” condition or 404 is encountered by web server

At this point the web server’s first instinct is to serve the default 404 page. This in many cases is the default text “Page Not Found” generated by the web server when a requested file cannot be located. This is where we want to step in and try to salvage the visit for the customer.

We addressed this issue with a client that had hundreds of search engine links and inbound links from other websites. This client was unique because their inbound links were not from link exchange programs but from actual customers who found their product interesting. This meant that each link was critical to our client as the quality of each inbound link was very high. The new structure of the website dictated that ALL pages would have new addresses and none of the old addresses would fit into this scheme. The structure of the old site did not allow for any single elegant mod_rewrite regex hack, so we needed something a little more extensive.

Our solution was to create a manageable, dynamic system that would allow for the old links to work seamlessly for the search engines and customers while allowing our designers, developers, and clients to make the structural enhancements necessary to take the site to the next level.

To do this, we create a new 404 page and told the web server to serve this page as the default. This new 404 page is actually a script that will make an attempt to find the content that the visitor is looking for and serve it, rather than the default action of displaying an error message.

To do this, several things must first be in place:

  1. A custom 404 page written in some form of scripting language (in this article I use PHP)
  2. An httpd.conf (can be in form of .htaccess file) entry telling the server to serve the new 404 page on page request failures
  3. A database of old (replaced) pages with a new entry for each old page entry.

When the new system has determined that it can successfully intervene on the visitors’ behalf, it must redirect the visitor (or spider) to the appropriate content. There are a few ways this can be done. We are going to use a location header (php) and a HTTP status code in this article. To get started, let’s look at what happens when we create a redirect on a page. Take a look at the following code:

<?
//redirect.php
header(“Location: http://www.example.com/”);
?>

This snippet of code will automatically redirect the visitor to http://www.example.com. This is done by setting a 302 header.

To FULLY understand this, we must take a look at the actual text being passed back and forth in the background. I suggest for anyone who wants to better understand the nuts and bolts of the HTTP underpinnings, pick up a free copy of ethereal. Ethereal is a packet sniffing (er..network protocol analyzer) application that will let you see every packet that passes over the wire for each request. Ethereal can also be very helpful when troubleshooting cookies and other headers being passed back and forth.

Here is a snippet from the actual exchange that took place when I requested http://www.hafenbrack.com/redirect.php

GET /redirect.php HTTP/1.1rn
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*rn
Accept-Language: en-usrn
Accept-Encoding: gzip, deflatern
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)rn
Host:
www.hafenbrack.comrn
Connection: Keep-Alivern
Cookie: ClickTrack=1.1.1.1.14477107221282637rn
rn

You can see that my browser requested the file /redirect.php and passed a few headers to the server. The line we are interested in here is the GET line. It shows the file originally requested.

Now, let’s take a look at how the server responded.

Below is a snippet from the actual exchange that took place when the server fulfilled my request for /redirect.php

HTTP/1.1 302 Foundrn
Date: Tue, 23 Dec 2003 20:53:46 GMTrn
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) mod_jk/1.2.0 mod_perl/1.26 PHP/4.2.2 FrontPage/5.0.2 mod_ssl/2.8.12 OpenSSL/0.9.6brn
Set-Cookie: ClickTrack=1.1.1.1.14477107221282637; path=/; expires=Tue, 23-Dec-03 21:53:46 GMTrn
X-Powered-By: PHP/4.2.2rn
Location:
http://www.example.com/rn
Keep-Alive: timeout=15, max=100rn
Connection: Keep-Alivern
Transfer-Encoding: chunkedrn
Content-Type: text/htmlrn
rn

Do you see what happened here? The server received the request, passed back a header with a status code of Found 302 (moved temporarily). While this would work fine for browsers, this is not really what we want in our case, as we want to tell search engine spiders that the content has moved PERMANENTLY to a new location. To do this, we must employ the 301 header response code.
 

In our previous version of redirect.php we employed the simple location header to redirect our visitor to the new location. We saw that this did not accomplish what we wanted as it gives no indication that the content has been moved PERMANENTLY. To make the move permanent we must use the 301 status code. The following is redirect.php with the 301 header added:

<?
//redirect.php
header(“HTTP/1.1 301 Moved Permanently”);
header(“Location: http://www.example.com/”);
?>

Below is a snippet from the actual exchange that took place when the server fulfilled my request for /redirect.php after the addition of the 301 header

HTTP/1.1 301 Moved Permanentlyrn
Date: Tue, 23 Dec 2003 20:58:52 GMTrn
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) mod_jk/1.2.0 mod_perl/1.26 PHP/4.2.2 FrontPage/5.0.2 mod_ssl/2.8.12 OpenSSL/0.9.6brn
X-Powered-By: PHP/4.2.2rn
Location: http://www.example.com/rn
Keep-Alive: timeout=15, max=100rn
Connection: Keep-Alivern
Transfer-Encoding: chunkedrn
Content-Type: text/htmlrn
rn

Now look at that first line in the above snippet from the exchange. The server is now instructing the browser or spider that the page has moved permanently to the new address. This allows the search engine to update its records regarding that particular file and begin spidering the new file.
 
Our Database Table

The database table is quite simple. We need only a couple of columns. In your version you may wish to add descriptions, notes, SE-friendly addresses, etc. but for this illustration we’ll keep it as straight-forward as possible.

RedirectContentID (auto number) OldURI NewURI
1 /OldAddress.htm Content.php?ContentID=358
2 /AboutUs.php Content.php?ContentID=234

Using the Data

In our table we have three columns – an ID column, a column for the old location of the file, and a column for the new location of the file. Please note that all paths are from the server root for portability. When we do our redirects we will prepend the server name to the location of the new content as the HTTP 1.1 spec calls for an absolute path (though many browsers accept relative paths).

Pseudo-Code for 404.php

Look at the name of the page that is being requested via the REQUEST_URI environment variable. This will contain the name of the page requested rather than the name of the 404 page currently running as Apache knows enough to understand that this variable will be critical to a 404 page.

Attempt to find the name of the requested file in the database table.
If the name is found:

  1. Set a header of 301 to tell the visitor or spider that this page has moved permanently.

  2. Set a location header to relocate the browser or spider to the address of the new page.

If the name is NOT found (a REAL 404 condition)

  1. Display a search field with some search tips

  2. Display a feedback form to allow the customer to ask for assistance in finding their content

At this point you may be saying “yes, yes, this is all very nice, where is the CODE??”

This is the code you need to see all this in action. Enjoy!

Database Creation Script

CREATE TABLE relocatedcontent (
  URI_ID int(11) NOT NULL auto_increment,
  OldURI varchar(100) NOT NULL default ”,
  NewURI varchar(100) NOT NULL default ”,
  PRIMARY KEY  (URI_ID)
) TYPE=MyISAM CHARSET=latin1;

#
# Dumping data for table ‘relocatedcontent’
#

INSERT INTO relocatedcontent VALUES(“1″, “/OldPage.htm”, “NewPage.htm”);

.htaccess code

# .htaccess

ErrorDocument 404 /404.php

404 PHP Script

<?
// 404.php

include(“db.php”);

// Get the name of the file requested into an
// array split on the “?”
$FileParts = explode ( “?”, $_SERVER["REQUEST_URI"]);

// Lets just use everything prior to the question
//mark, as the upcoming query will not match if
//there are arguments in the string.
$FileRequested = $FileParts[0];

// Get the server name to generate an absolute
//path for the redirect
$ServerName = $_SERVER["SERVER_NAME"];

// Setup the DB object for our query
$q = new DB;
 
// Look and see if the requested file is in the
// database
$query = “SELECT * from RelocatedContent WHERE OldURI = ‘$FileRequested’”;

// Run the query
$q->RunQuery($query);

// If we found a match..
if ($q->GetNextRecord()){
 $NewURL = $q->Fields(“NewURI”);
 header(“HTTP/1.1 301 Moved Permanently”);
 header(“Location: http://$ServerName/$NewURL”);
}

// If we made it this far, we have a “real”
// 404 on our hands, show ‘em some options..
include(“404form.htm”);
?>

Database Connection Code

<?

//db.php

class DB {
 var $classname = “DB”;
 var $Host = “localhost”;
 var $Database = “SystemDatabase”;
 var $User = “SomeUserName”;
 var $Password = “SomePassword”;
 var $result;
 var $line;
 var $link;
 
 function DB() {
  $this->link = mysql_connect($this->Host,$this->User,$this->Password) or die(“Could not connect”);
  mysql_select_db($this->Database) or die(“Could not select database”);
  return $this->link;
 }
 
 function RunQuery($val){
  $this->result = mysql_query($val) or die(mysql_error());
  return $this->result;
 }
 
 function GetNextRecord(){
  $this->line = mysql_fetch_assoc($this->result);
  return $this->line;
 }

 function Fields($val){
  return $this->line[$val];
 }
}
?

Google+ Comments

Google+ Comments