Google and Yahoo initially announced their collaboration around November 16, with Microsoft joining in shortly thereafter. Up until now, when webmasters wanted to submit their sites to be indexed by the search engines, they had to use one RSS feed for Google, another one for Yahoo, and a third one for Microsoft. Now, with the launch of Sitemaps.org, webmasters and publishers have “a unified way to send their content…let our search engines know about new and existing content,” explained Tim Mayer of Yahoo.
At the time they announced the collaboration, Google and Yahoo also issued an open invitation to other search engines willing to join the project. Microsoft accepted the invitation; as of this writing, neither Ask.com nor any other search engine has come on board, but that can be expected to change.
WebProNews boasts a short video on its web site (just over 10 minutes) in which Google’s Vanessa Fox and Yahoo’s Tim Mayer explained why the search engines are working together on this. They suggested it could benefit sites of all ages and sizes. New sites stand to benefit because sometimes the search engines may have a difficult time discovering them. Owners of larger sites that have been established for awhile will appreciate the fact that Sitemaps.org helps multiple search engines find and index all of their URLs, since spiders sometimes have a difficult time getting everything from a large web site.
Sitemaps.org is an outgrowth of Sitemaps 0.84, a tool originally released by Google in June of 2005. The original release gave webmasters an easy way to tell Google about new content. Basically, a Sitemap is an XML file that webmasters make available to search engine spiders on their web sites. In the file, webmasters list all of their URLs along with optional meta data (such as frequency of page updates, or the last time a particular page changed). In short, the Sitemap lets webmasters help the spiders crawl their sites more effectively.
The Sitemaps protocol benefits both webmasters and search engines in a number of distinct ways. First, it helps to keep search engines informed of freshly updated content in a timely manner. Second, it improves the comprehensiveness of the crawl, allowing webmasters to include all of the pages they want indexed to make sure that none are missed. Third, it lets the spider ignore pages that haven’t been updated since the last time it visited a particular site, increasing its speed and efficiency and saving bandwidth.
From the very beginning, Google encouraged other sites to adopt its Sitemaps protocol, and it has received widespread support. As you might expect, it has been particularly helpful to web sites that manage lots of pages with dynamic content – in fact, if your web site is large enough to require a content management system (CMS), it doesn’t make sense NOT to use Sitemaps. Also, since many sites that use a CMS carry out ecommerce, keeping search engines updated as to fresh content increases traffic to their sites, which is good for business.
Sitemaps.org gives visitors a complete explanation of all the features of the protocol and what they need to do to submit a Sitemap. The site itself contains nothing to indicate that any of the search engines are specifically involved with it – no logos or branding. In fact, it looks fairly bare and non-commercial, and is likely to stay that way. In the WebProNews video interview, Fox noted the significance of the .org suffix; it’s historically been used to indicate not-for-profit organizations. She said that there would be no advertising on Sitemaps.org, “no AdSense on the side.”
So Sitemaps.org isn’t commercial; it’s worth noting what else Sitemaps.org isn’t. As Yahoo notes in its blog post on the subject, “you don’t need to worry about the three search engines merging and not being able to use your favorite anymore.” It also isn’t a sign that the search engine spiders will start getting “lazy” and only crawl sites with Sitemaps or pay attention only to Sitemaps. Both Fox and Mayer gave assurances in the video interview that their search engines’ spiders would continue to crawl web sites as expected.
Visitors to Sitemaps.org are greeted by a home page that defines the term Sitemap and explains why webmasters and publishers may want to use a Sitemap. It says right out in the open that “Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.” There’s a link at the bottom of the home page that leads to the terms and conditions of use. These are relatively short and pretty vanilla.
There are three links at the top near the right hand corner of the page: one for home, one for protocol, and one for frequently asked questions. The Protocol link leads to a long page that describes the XML schema for the Sitemap protocol. This was clearly written for the webmasters and others who handle the technical details of a web site; still, given that, it’s actually pretty understandable even for someone with limited technical knowledge. The designers of this page did what they could to make it as readable as possible, including links that jump to seven different sections of the document.
I was glad to see an explanation that even I could understand for how to validate your Sitemap! The page also explains how to extend the Sitemaps protocol (presumably for including meta data that gives extra information about the page you’re submitting). It’s worth noting that there is a certain amount of flexibility for each search engine within the protocol. At the very beginning, the protocol page notes several tags that must be present, and adds that “All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine’s documentation for details.” It would be very nice, in the future, if Sitemaps.org links the words “search engine’s documentation” to go to another page that links to documentation for all search engines involved in Sitemaps.org. Such a setup would help with Sitemap.org’s apparent goal of being a one-stop shop for Sitemap information.
For those of us who aren’t quite so techie or respond better (at least initially) to information formatted a little differently, Sitemaps.org provides a helpful FAQ page. Here you can learn the answers to specific questions, such as how large you can make your Sitemap (10 MB with 50,000 URLS maximum), what character encoding method to use (UTF-8), and more. At least some of these questions are answered in the actual Sitemaps protocol page, but it’s still a good idea to have a FAQ page that includes this information.
Yahoo notes in its blog that “this is the first-ever joint announcement by all three search rivals” and couldn’t resist titling the entry “Yahoo!, Google and Microsoft Join Forces (really!!).” The rest of the search community seems to be as delighted about this new development as the three search engines themselves.
“At industry conferences, webmasters have asked for open standards just like this,” remarked Danny Sullivan. “This is a great development for the whole community and addresses a real need of webmasters in a very convenient fashion. I believe it will lead to greater collaboration in the industry for common standards, including those based around robots.txt, a file that gives web crawlers direction when they visit a website.”
David Berkowitz, director of strategic planning for search engine marketing firm 360i, takes a more understated approach. “There’s a sense that search optimization could get a little easier,” he noted. “This partnership could help pave the way for clearing some of those headaches that marketers and site owners face.”
As of this writing, SEO forums have just started to pick up on the news. SEO_AM, a regular contributor to SEO Chat’s forums, noted in a post linking to news of the announcement that “At least we will have some site map standardization. It is also nice to know that the Big 3 do talk to each other and, occasionally, agree on a common goal.”
Others have been more enthusiastic. The WebProNews video elicited no fewer than 44 comments, with many cheering the news and agreeing that it’s about time something like this happened. Some posters were particularly pleased that Sitemaps.org would not carry a brand. One poster sounded a negative note: “I can only imagine that there will be a lot of problems at first that will need to be ironed out.” The general take-home message was that this will save a lot of work all around – and given that it will probably improve the search experience for everyone, it’s a big win indeed. If this works out well, we may see more of these kinds of initiatives in the future.