Wikia Gets Closer to Launching

For all of its inaccuracies, Wikipedia proved that you could create a useful resource out of the contributions of hundreds of volunteers. Now Jimmy Wales, the sometimes controversial founder of the online encyclopedia, thinks he can do it again, this time with a search engine. How likely is he to succeed?

Wales certainly has a track record. Having made his fortune as a stock-options trader, he went on to found Bomis, a search portal that creates and hosts web rings around sometimes erotic topics. The portal made its money from advertising, which also helped to fund other ventures. The first of these, in 2000, was Nupedia. This was Wales’ first attempt at creating an online encyclopedia. Its articles were written by experts and licensed as free content.

Nupedia’s contribution process was cumbersome; all articles were subjected to peer review to make sure they were of high quality. By the time Nupedia ceased operating in 2003, it had produced only 24 articles that made it all the way through the review process; 74 more were in progress. It’s pretty clear why the site never took off.

By this time, of course, Wikipedia had been created (in 2001) and was well on its way to success. It now features nearly two million articles in English alone, and a very active community that updates and adds items as necessary based on news events. Printed encyclopedias, or even ones that are published regularly on CDs or DVDs, could never keep up. There are Wikipedias in a wide range of languages, though with significantly fewer articles than the English version (there is even a Klingon version with 83 articles).

Wikipedia is a not-for-profit venture, surviving solely on contributions and run by the Wikimedia Foundation, created in 2003. Wikia, on the other hand, was founded by Wales and Angela Beesley in 2004 as a for-profit wiki hosting service. It is free of charge for readers and editors, and has received venture capital funding from Bessemer Venture Partners and Amazon.com.

These creations, in particular Wikipedia, have not been without their issues. Wikipedia articles are regularly subject to vandalism, and their accuracy may be questionable since the volunteers working on them are often not experts in their fields. Wales himself is on record as saying that Wikipedia is a starting point for a general overview, not an ending point. So how would this work with a search engine?

Search engines index the Internet and deliver relevant results to users in response to key words. At first, it seemed as if Wales planned to use volunteers to handle this, completely replacing the algorithm. But he found a better answer.

LookSmart purchased a search project that it later neglected, named Grub. Wikia came in and bought it, and made it open source for the first time in four years. Grub is an Internet indexing program; the fact that it is now open source means that programmers can work on the code, modify it, test it, and in general build something better.

That transparency is a key aspect of Wales’ approach to search. “It’s not a good thing that we are getting search results from a handful of very large players and we have no idea how they are generated,” he said when he announced the purchase at the O’Reilly Open Source Convention in Portland, Oregon. “To some extent this is a political thing…Search is part of the fundamental structure of the Internet and should be transparent and open.”

Working on the software isn’t the only way in which volunteers can help the Search Wikia project succeed. The software is designed to be distributed on lots of computers. Much as SETI@Home or Folding@Home works, participants download the software at http://www.grub.org/, and when they are not using their machines, Grub uses their spare CPU cycles to index the web.

Humans will also be directly involved in the results to provide assistance with editing. Human editors could be used to handle tasks that computers still do poorly, like distinguishing when “bass” refers to a fish or a guitar, or “apple” refers to a fruit or the company, or even when “palm” refers to a location like Palm Beach, a hand, a type of tree, or a personal digital assistant.

Another open source element of the Wikia search engine will be Lucene. Lucene is “a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform,” according to its web site. The latest version is 2.2, released in mid-June.

The most obvious issue, unfortunately, is the one on which the whole project rests. By making this into an open source project, Wales has invited everyone to look at the source code for the developing search engine’s algorithm. Are we sure this is better than the “black box” approach that Wales denounces so strongly?

Google, Yahoo, and Microsoft keep their search algorithms secret for a reason. So many people use the major search engines to find whatever they are looking for online that getting your web site to the top of the search engine results pages for appropriate keywords can seriously increase your traffic, the amount of business you conduct, and your company’s profit. Optimizing a web site to achieve this result is now a full-time occupation – literally, as SEO Chat readers know all too well.

So important is this climb to the top that many in the business will use deceptive techniques to achieve their goals. Guessing what Google’s algorithm will do in response to specific optimization efforts often plays a major role. So what do you suppose will happen when a search engine actually publishes its algorithm for all to see? How long do you think it will take before the search engine spammers examine the code and start exploiting all the flaws they can find? Wale admitted that “If published algorithms make it too easy for spammers to game the system then we’ve got a real problem and my whole idea won’t work.”

Wales has stated that he will rely on the search engine’s community to police spammers, at least in part. Judging from the number of spammers out there, however, any community will be overwhelmed in short order. According to George Gardner, writing for Tech.Blorg, “The bottom line is that after Wikia Search’s algorithm is known, it will not only be just another search engine, but will reasonably be the worst search engine on the Internet.”

This isn’t the only problem Wales faces, though it might be the biggest. The other issue is one of scale. comScore Media Matrix’s numbers for June listed Wikipedia as the ninth most popular site on the Internet, with 47 million unique US visitors. Google, however, had well over 123 million unique US visitors in the same time period. Granted, Wikia Search will not have to face those numbers overnight, but what will keep them from hitting a bottleneck?

If it was simply a matter of a computer-controlled algorithm, the solution might be better programming and more hardware. But Wales envisions people directly involved in making the search engine’s results better, perhaps by some kind of voting system. At the Internet’s current rate of growth, it’s questionable whether volunteers would be able to keep up.

Finally, Wikia Search will no doubt face the same issue that Wikipedia faces: the question of accuracy. Andrew Keen, who seems to be making a career out of railing against user-generated content, pointed out that there is no way to tell whether anonymous human contributors have their own agenda – and what that agenda is if they do. “I don’t trust Wikipedia and I certainly wouldn’t trust an open-source search engine that is shaped by anonymous people,” he said.

It’s worth noting here that this is far from the first attempt at human-assisted search. Some have pointed out that the very first “search engines” predate the Internet by decades and are still around; they’re called librarians. Yahoo lets users participate in its popular Yahoo Answers search; users submit questions, and other users submit answers. Others have also used the open source approach; Nutch, for example, is built on Lucene Java.

To cover just a few of the engines I’ve reviewed here that qualify, Mahalo.com presents users with results that have been painstakingly put together by humans. Social bookmarking sites like Del.icio.us and Searchles benefit from a community of users saving their favorite sites and letting everyone search everyone else’s bookmarks. Some search engines, like Spock, still in closed beta, work a bit like Wikipedia in that anyone can alter entries – but Spock is specialized for searching for people.

So Wales isn’t quite the first, but he is perhaps the first to dream quite this big. He thinks Wikia Search will be able to capture five percent of the search market – and he believes this will actually benefit the bigger search players. “Google will be much better off if search becomes ubiquitous and there are lots of players because that doesn’t threaten their dominance in the advertising market,” he explained. “Google is much better off with lots of players and being the ad broker fore everyone, because ad brokerage is a defensible business.”

Whether Google will see it this way remains an open question. Of course, Wikia Search might not catch on at all, rendering the point moot. The possibility of failure doesn’t seem to weigh too heavily on Wales though. “I could fail. I have no idea. But I’m going to have fun trying,” he said. “I just want to do something cool.”

Google+ Comments

Google+ Comments