Using Lynx for SEO Analysis

In part one of this two-part series, you learned how to use Lynx to navigate web pages. You also discovered the meaning of the color coding the Lynx text browser gives its output, which you use to determine if the text is a hyperlink, plain text content, bold or italics. In this part, you’ll learn how to analyze the information you uncover about your web site using Lynx.

It is vital that you have Lynx installed on your computer if you want to get the most out of this article. If you still do not have a copy, you can download it from here: 

http://cid-c3bc6a3c5463e218.skydrive.live.com/self.aspx/.Public/Lynx%20Text%20Browser.zip

I recommend reading the first part because you’ll need the information it contains to properly understand the second article. This section will give detailed examples of how you are going to use Lynx to analyze search engine crawlability issues. It is like a physician using X-rays to look for problems inside a person’s body. You can do the same when you use Lynx to browse your website.

Make sure web pages link and navigation are crawlable

Due to rapid advances in web development, modern developers often use techniques that introduce problems in crawling. When your website is not crawlable, search engines will not properly index text content. Since your keywords are text content, if they are not found or prominent in your indexed pages, you will not rank very well, especially for long tail terms. And if you do not rank very well, it affects the traffic and profitability of your website.

It is important to know that what the Lynx text browser sees is very different from what you can see with normal web browsers. One of the most popular modern technologies for displaying content is Flash. To find out whether your Flash-based website is crawlable, you need to check three important things in Lynx:

  • Navigation links
  • Textual content
  • Text found in images

Let’s start by demonstrating an actual Flash-based website, because they often face more crawling issues than HTML-based websites. One of my favorite Flash-based websites is http://www.2advanced.com/ .

Below is the screenshot of the home page, displayed entirely using Adobe Flash:

When you visit the website using a normal browser (Firefox, for example), you can see visible navigation links with labels that include Contact, Language, Home, Launch project, More details, Privacy and More News.

You can even see the detailed list of website navigation when you mouse over the phrase “Expand navigational array,” and you can see Flash-based navigational links that include “company,” “portfolio,” “services,” “case studies,” “recognition,” and many others to the right.

Flash-wise, this is a well-developed website. However, when the Lynx browser looks at the site, using the same kind of eyes as the search engine crawlers, it sees a very different picture:

So what happened?:

Navigation links. Those navigation links discussed above are not visible to the Lynx browser. This means that if Googlebot or other search engine crawlers land on the home page, they will not be able to crawl internal links pointing to other pages of the website. Thus, the crawlability of website from the home page is poor.

Textual content. The indexable content is not the same as that seen by normal web browsers. Even navigating on the visible “click here” link in Lynx will take you to nowhere.

Text found in images. Of course those images do not have alt tags, and therefore cannot be seen by Lynx.

If you have a website like this, and you are serious about ranking your home page in Google for both your main term and long tail keywords, it will be difficult because the search engine spiders will not be able to find content to index.

Since optimizing Flash and other non-indexable based websites is beyond the scope of this article, I will explain the simple rule: every piece of text you need to convey to your readers as seen on normal browsers should also be well-indexed by Google.

An example of a well-optimized Flash-based website is http://www.marcecko.com/. The entire site is Flash-based.

However, using the Lynx browser to view the website (screenshot above, comparing the normal browser and Lynx view) reveals that there are no search engine crawling issues, since there are indexable hyperlinks for bots to follow. How about the text content/images?

For example, by using the normal browser and clicking the “Graffiti” Flash-based image link on the home page, you can see the content embedded in Flash as shown in the screenshot below:

Yet even when you use Lynx to visit the Graffiti page, you can clearly see the indexable content (screenshot above). This means that search engine crawlers like Googlebot can properly fetch this content.

This leads to two recommendations. First, if you have a website that uses advanced web technology like Flash, JavaScript and AJAX, do not just assume it will be crawled and indexed well by search engine crawlers. It is an important part of your website design to provide a text version for search engine crawlers so they can properly fetch text content as intended.

Second, if you want to present user-friendly navigation to search engines, you must have crawlable links. You can use Lynx to test if you can indeed visit those pages by just visiting the links as they are displayed in Lynx.

One of the easiest ways to hide text is by using CSS. In this test page: http://www.php-developer.org/hiddentextexample.php , using normal browsers, you can see the text content shown by the screenshot below –  however, you can see additional text content using Lynx (the hidden text is inside the dotted white box):

Hidden text is not recommended by search engines. Very bad things will happen if it is used in relation to spam. The best course of action is to avoid text hiding techniques. For example, do not use {visibility:hidden} in CSS.

If you have a big website, checking each page for hidden text seems impossible using Lynx. In this case, and this is also recommended by Google, you only need to check the most important pages in your website, particularly those that you need to rank well in search engines (e.g home page, most important pages, etc).

These are similar to Flash-based website navigational links, only they use JavaScript technology to display hyperlinks. This also applies to image-based links and buttons, so for example, the “YES” button on this Blogger page cannot be followed when using Lynx:

http://minethatdata.blogspot.com/

Viewing the page in Lynx browser:

It shows that the button itself does not contain hyperlink information at all but plain text when viewed in Lynx (“Yes” is white). This concerns me; if there are older back links pointed to that page, they will not point to the new site because Blogger does not allow 301 redirection or offer ways to move to new blogs or domains in Google Webmaster Tools (if the content is not hosted in Blogger anymore). Blogger should make this link clearly indexable or pass the link juice to the newly-hosted domain.

The bottom line is, if you are serious about crawlability and passing link juice within or outside your domain, you should double check the integrity of those links using Lynx to make sure they are visible and indexable by any search engine bot.

Google+ Comments

Google+ Comments