Using Xenu Link Sleuth to Spot and Improve External Links in a Domain

Xenu Sleuth is one of the most important open source tools in web development and search engine optimization. Xenu Sleuth performs the task of crawling and checking links in a domain, just as other bots such as Googlebot and Yahoo! Slurp do. This article explains how to use Xenu Sleuth to improve the external or outgoing links on your web site.

Xenu Sleuth is free and can be downloaded here. However, this is a tutorial focusing on Xenu Link Sleuth 1.2j. I believe that the developers released a more recent version. If this is the case, please find a link in their site to download the recommended (older) version.

External links are the types of links in a website that go out to other domains. Search engine rankings in major search engines like Google, Yahoo and MSN give importance to the quality and relevance of external links in your website.

Properly-checked external links can help search engine rankings a lot. The objective of this article is to show to you how you can spot and improve the external links found in your website using Xenu Link Sleuth.

The Basics about Xenu Sleuth: Important information

Xenu has its own file type, .XEN. After it crawls your site, you can save the results to your desktop. This is highly recommended and useful because:

  • You can re-open the Xenu file in the future for further analysis of the links in your site.
  • When you have a Xenu file of a certain domain, say in month X, and then you let Xenu re-crawl the site after six months, you can compare the two Xenu files to easily see what links appear and what links are gone in between these months.
  • On big sites, a Xenu crawl can last an hour, so it is important to save it as .xen so that you will not need to re-crawl the site again.

Not all sites are crawlable with Xenu; some sites have disallowed it in their robots.txt because Xenu can consume a lot of bandwidth and slow down the site. This means that you have to use Xenu with caution.

Also, if the site uses session IDs and their shopping baskets are crawlable, Xenu can add several items into the basket without checking out. Re-crawl the site with caution, as this can cause some kind of DOS (Denial of Service) attack.

To start Xenu, you must configure things correctly. After successful installation, launch Xenu, navigate to File? and then Check URL. (see below)

Important rules:

  • Type the root URL of your domain, starting with http://. This will ensure that Xenu will start crawling at the top most part of your domain and continue all the way down.
  • Uncheck External links, so that you can distinguish external links in the exported report later and save time in the crawling process.
  • Leave other settings as is (default).

See the screen shot above for what should be filled in at Xenu’s Starting Point.

Okay, let’s go to preferences and options. This can be found under Options ? Preferences. Then make sure Xenu has been set using the following values:

Parallel Threads: 30

Ask for password or certificate when needed: (check this one)

Treat Redirection as errors: No (do not check this one)

Maximum Level: 999

In the report section check this all options, starting with Broken Links, ordered by links until Orphan files

It is recommended that during the crawling process you minimize all Internet activity; otherwise, you might experience several timeouts in the results.

The moment you press “OK” in Xenu’s Starting Point where you input the domain root to be analyzed, Xenu will start crawling the website.

You can then see the table format showing the ongoing results of the crawled URLs. The most important section is the “Status” column, where you can distinguish the difference between internal and external links in your domain.

External links are labeled ”skip external,” while internal links are labeled ”OK,” “Timeout,” etc.

It is highly important that you not stop or pause the crawling process, as doing so can possibly corrupt the data being gathered. The progress of the work can be displayed in the lower part of the screen.

When the crawling process has been completed, it will display 100%, for example: 3020 of 3020 URLS (100%)

Important: after crawling, Xenu will ask you (using a dialog box) if you want a report. Just click “NO.”

Another important thing to do is to save Xenu after it reaches 100% completion. To do this, click File, then “Save As,” and then choose whatever name you like.

In the screen shot above, the green lines show internal links while the blue ones labeled “Skip External” are the links going out to other domains.

The next thing to do is to export the Xenu data into some kind of a spreadsheet. If you are using MS Excel as a spreadsheet, this is easy to do, as compared to Open Office Calc.

To export the data, click File— ?  Export Page Map to Tab Separated File , and type any file name you want. Do not include extensions as part of the file name; Xenu will do this automatically for you. Also make sure that "save as type" is a text file (*.txt).

You can save the exported data to your desktop. After that, to open the file in spreadsheet format, right click on the exported file, and then “Open With” Microsoft Excel. You should see five columns. They are:

  • OriginPage
  • LinkToPage 
  • LinkToPageStatus 
  • LinkToPageTitle 
  • OriginPageDate

The LinkToPage is the URL target of the external links in your domain that can be found in OriginPage.

LinkToPageStatus will tell you if the URL is an external or an internal link. If it is an external link, it will show “Skip External.” The other two columns are not very important for the analysis: LinkToPageTitle and OriginPageDate.

For easy analysis, rename the column name “OriginPage ” to “Location of External Links” and “LinkToPage” to “External link URL.”

You can then filter column C, “LinkToPageStatus,” using MS Excel. To do that you will have to click data ? filter – ? autofilter , and then in column C select “Skip  External.” Column C should now show all links going out to external domains.

Finally, select the area of the filtered result using your mouse and then click Edit — > Copy. Insert a new worksheet by clicking “Insert – ? Worksheet,” and rename it to final. In the blank area, paste the result.

After pasting the data, click “File ? Save As,” type the filename you like, and then in the file type, select: “Microsoft Excel Workbook (*.xls)

By following these steps, you should end up with something that looks like the screen shot below:

 

Note that there are some external links reported that are not real external links. Here are some things to be aware of:

  • Ignore those external links whose URLs include a .js extension. These links are JavaScripts, and Google will not crawl links in JavaScript. Common examples of this are Google Ads and analytics scripts, but these can still be picked up by Xenu.
  • Also ignore those external links whose reported URL contains a mailto: 
  • Also ignore any links starting with file://.

To ignore, simple delete the rows of the affected external link URLs. Another very important thing is to accept only those links in standard href syntax:

<a href=”http://www.somewebsite.com”>Anchor text</a>

Focus only on this type, because it is the one that will be indexed by search engines.

To confirm, go the source code of where these external links are found in your domain, according to the Xenu report. And check to see if it is using the correct hyperlink HTML syntax.

Finally, once all of these links are filtered, you can now judge their quality and relevance. How?

Judge the quality of external links in terms of:

  • Usefulness of the link to your visitors - If the link appears to be extremely useful to your visitors, let search engines crawl it. These links are typically valued and will help your rankings. Examples of this kind of link are the ones that support your existing content and are highly related to your website. 
  • The integrity of the domain - Do not link to websites that appear to be banned and penalized by search engines. To check this, see if they are ranking well for their targeted terms and if their site is indexed by Google. If not, it is likely that they are not a trusted domain. 
  • The relevance and expertise of the domain in your selected niche - Always link to expert websites. For example, if your web site’s goal involves releasing updated news and doing some editorial comments/ opinions, CNN.com or BBC.co.uk are your expert sites.

What if your external links fail on the above criteria? Put a rel=nofollow attribute on the link, on all of the pages that it can be found, as reported by Xenu.

Or you can remove that link, if it is dead or really does not support your website content and functionality.

Google+ Comments

Google+ Comments