Sphere Searches the Blogosphere

Have you ever been frustrated by blog searches that show more blog spam than the kinds of real gems you know are out there among all the dross? The folks behind Sphere, a blog search engine so new it’s still in closed beta, think they have the answer.

Traditional Internet search has come a long way since 1997. Unfortunately, many people will tell you that the same thing is not true when you’re trying to search for relevant blog entries. Sure, there’s great content out there, as you would expect from any medium that encourages enthusiasts to communicate about their favorite topics…but there’s also a lot of garbage, and some very self-centered blogging (which can be entertaining, but might not be what you’re looking for). Add in the spam blogs cluttering up the blogosphere (which I wrote about here http://www.seochat.com/c/a/Search-Engine-
), and it’s very easy to feel like you’re adrift in a sea of flotsam and jetsam, looking for a decent raft.

This isn’t to say that there aren’t search engines dedicated to blogs out there. Technorati, Feedster, and Icerocket are three of the best known websites that offer blog search. A lot of the current blog search companies rely on tags and RSS feeds, and will do a straight sort by date, with most recent entries first. While this works to some degree, it doesn’t work as well as you might think — and if you can remember back to what web search was like back in 1997, you’ll understand why. Tag-based searches are easy to overwhelm with spam. Also, just because an entry is the most recent doesn’t mean it’s the one that answers best to what the searcher is looking for. As with those pre-Google search engines, one of the biggest issues in blog search is relevance, but there are some interesting additional wrinkles that are almost unique to blog search.

In traditional Internet search, a lot of users are just looking for an answer to a question. With blog search, a user might be looking for a new blog to start reading regularly. Being able to say that a particular blog is relevant and authoritative — which may have a slightly different meaning in the blogosphere than it does in traditional Internet search — matters a great deal. The newness of content becomes a special issue as well; it makes the blogosphere very volatile, and more challenging to keep track of, in its way, than news. And do I need to mention that the amount of linking that goes on with blogs can overwhelm traditional search engines? (In fact, that’s a tactic that some SEOs have used to raise the PageRank for their websites, but a discussion of that is beyond the scope of this article).

{mospagebreak title=Sphere Enters Blogosphere}

So what is a searcher to do if they’re looking for a way to find the gems among the literally tens of millions of blogs out there? This is where Sphere (http://www.sphere.com) enters the picture. Founded in April 2005, Sphere has been indexing the blogosphere since January 2003. Still operating as a closed beta, Sphere has impressed the search mavens who have tried it out, including Jeremy Zawodny of Yahoo!.

It’s not too surprising, given the company’s background. Sphere supposedly started in the best shoestring tradition, with less than $200,000. But Sphere’s chief officers and advisers are veterans of other Internet start-ups. One of their advisers, Toni Schneider, founded Oddpost and is now working at Yahoo!. In fact, Oddpost was going to bring it out themselves until they were bought by Yahoo!. Another adviser, Matt Mullenwegg, is associated with WordPress.

Tony Conrad is Sphere’s CEO. If his name sounds familiar, it may be because he was a venture capitalist himself who has funded companies such as Oddpost. The other two people co-founding Sphere were also involved in the Waypath blog search engine.

Judging from the interest Sphere is exciting in venture capital companies, Conrad may have a winner on his hands. John Batelle noted in his search blog that KP partners Will Hearst and Kevin Compton have thrown angel money at the young company. Other names you might recognize that are backing the company include Doug MacKenzie (Radar Partners), Phil Black (Blacksmith Capital), David Mahoney, Vince Vanelli and Mike Winton.

There has been some speculation as to how Sphere will spend this money. If the company is going to be really successful, it needs to be able to scale, which means it needs a lot more machines. That’s one way it could spend its angel money. It could also use a lot more distribution, of the sort it could gain from a deal with a major search engine — and Yahoo! seems like the obvious choice, since one of its advisers is already working there. Indeed, Sphere could end up being bought by Yahoo! eventually. But enough about the company’s background; it’s time to explain why so many people are so excited about Sphere.

{mospagebreak title=How it Works}

Sphere’s algorithm examines three factors that are very important to sorting blogs and delivering the best results to searchers. The first of these factors is the link structure. Sphere dives in and tries to decipher the complicated issue of who is linking to what. That might seem somewhat straightforward with a traditional search engine, but it’s not so obvious with blogs. You see, because the blog community is so interactive, you often have “conversation starters” and those who simply follow along, linking to the discussion or commenting on it. By analyzing the link structures, Sphere gains some idea of which blogs are acting as authorities.

The second factor is the meta data, or information about the blog itself aside from its content. How frequently does the blog writer post? How long is a typical post? How many links does the average post get? These and other factors are considered part of the meta data, and are tracked by Sphere’s algorithm.

The third factor is what Tony Conrad and the other co-founders clearly consider to be the biggest point that differentiates Sphere from the other blog search engines on the Internet. It’s called content semantic analysis, and Sphere co-founder Steve Nieker maintains that “it’s the hard part, and most important ingredient of the secret sauce.” This is important because spam blogs (of the sort that just say “porn” repeatedly, for instance) will go to the bottom of the search results, while the most relevant blogs will come to the top.

Om Malik noted something in his blog entry about Sphere that seems related to this content semantic analysis. The algorithm apparently includes a pronoun checker. If a blog tends to use “I” a lot, then it’s probably more personal and less focused on the kinds of topics that the searcher is looking for. That seems intuitively correct to me. I’d expect to be most interested in the personal blogs of people I already know, and when I want to find those I wouldn’t use a search engine.

{mospagebreak title=Sphere in Action}

So what is it like to use Sphere to search for blogs? It boasts a very clean interface, judging from the screen shots I’ve seen. You can sort entries by relevance or time, of course. Results include a title and a paragraph from the blog post. You can click on the link to the post, or you can also click on a red “profile” link. This gives you a pop-up window that shows you the title of the blog, its link, the average number of posts per week, average words per post, and average links per posts, as well as its recent links to other blogs. As I mentioned earlier, a lot of people doing a blog search are looking not only for information, but a new blog to read, and this kind of data can help them decide.

If you’re looking for results by date, you can use a drop-down menu to check the last four months, last week (default), last day, last 12 hours, or last hour. How’s that for up-to-the-minute? You can also choose to see blogs written in languages other than English.

One of Sphere’s algorithms takes consistency of blogging on a particular topic over time into consideration. You can see the fruits of that algorithm in the “featured blogs” box that shows up to the right of your search results. Below that box you’ll see another one for “related media,” which displays links to news stories, books, podcasts, and photos related to your search terms.

If you know how to use Boolean operators, you can use them with Sphere. You can also limit your search in several other specific ways. You can tell Sphere to search only the titles of blog posts. You can also tell it to search only a specific blog (great if you remember a particularly excellent or humorous entry in a blog that you want to share with someone…but which entry was it?!). Or you can tell it to search only a particular website, which would work well if you know your target is on a site with multiple blogs.

I can easily see at least one way for Sphere to monetize itself with the “related media” box, though I’d rather see it use more conventional search-related ads than corrupt that box. It’s a cool feature, and a good bet that someone looking for a blog on a particular topic might be interested in news and other items related to it as well. I don’t consider myself a heavy blog reader, but I’m looking forward to seeing Sphere when it comes out of closed beta mode.

[gp-comments width="770" linklove="off" ]