At this point in development, Powerset isn’t even open to beta testers. It’s accepting sign-ups for its PowerLabs, where Powerset testers will try out the technology on a limited number of web sites, like Wikipedia and the New York Times. According to the various accounts I’ve read, the closed beta testing is supposed to start in September.
You might have an interesting time signing up for it. I tried multiple times without apparent success, getting a 502 proxy error every time. The next day, however, I had a number of email messages in my in box from Powerset asking me to confirm my email address. I did, and I received a welcome message (about which more later).
But since there’s quite a bit we don’t know about Powerset, let’s start with what we do know. What is “natural language”? It’s what lets humans understand each other, and why computers have such a hard time understanding what we say. It’s the ability to extract actual meaning from sentences. It means that someone or something receiving a query for “politicians who died of disease” would recognize that state governors, prime ministers, and presidents are politicians, and pneumonia, cancer, and diabetes are diseases.
The idea of getting computers to understand natural language has been around since 1950, when Professor Alan Turing described his famous Turing test in a paper. We have made a certain amount of progress since then, thanks to improvements in technology. But getting computers to connect concepts has proven so tricky that today’s most successful search engines use a statistical approach instead.
The most prominent search engine to try a natural language approach was Ask Jeeves. It boasted that users could ask questions rather than resort to using keywords. Unfortunately, it didn’t work very well. Ask Jeeves has since become Ask and is trying to reinvent itself to become more competitive. I was delighted to see recently, when I reviewed Ask3D, that its technology has improved tremendously. But it does not seem to be taking a natural language approach these days.
Powerset thinks they have a handle on natural language. Their search engine is actually supposed to learn and get better as more people use it. We won’t know whether it’s real or all hype until September at the earliest. In the meantime, though, it’s instructive to take a look at where this technology is coming from.
Xerox’s Palo Alto Research Center (PARC) has long been known for inventing things that other companies end up commercializing, earning it the title of “lab of missed opportunities.” These include the graphical user interface and the Ethernet networking technology. But in a deal that took a year and a half to negotiate, it licensed its natural language technology to Powerset.
Fernando Pereira, chairman of the department of computer language and information science at the University of Pennsylvania, noted that the PARC natural language technology is among the “most comprehensive in existence.” But is it good enough for search? “The question of whether this technology is adequate to any application, whether search or anything else, is an empirical question that has to be tested,” he explained.
The PARC technology has 30 years of research backing it up. PARC researchers have been working with Powerset researchers for more than a year to build the prototype search engine. Indeed, Ron Kaplan, leader of PARC’s natural language research group for several years, joined Powerset as chief technology officer. Kaplan had been approached by Google, but turned them down.
Yes, you read that right. Why would Kaplan turn down an established player like Google to work at Powerset? He doesn’t think Google takes natural language search seriously enough. “Deep analysis is not what they’re doing,” he explained in an interview with VentureBeat. “Their orientation is toward shallow relevance, and they do it well.” But Powerset is different; it “is much deeper, much more exciting. It really is the whole kit and caboodle.”
Powerset has also hired a number of engineers away from Yahoo. One name from Yahoo that stands out is Tim Converse, an expert on web spam; another is Chad Walters, who worked for Yahoo as a search architect. The company also claims its employees have worked for Altavista, Apple, Ask, BBN, Digital, IDEO, IBM, Microsoft, NASA, Promptu, SRI, Tellme and Whizbang! Labs.
Before I go into what that technology can do, and what Powerset envisions itself becoming, it’s worth noting that those patents are pretty ironclad. According to Powerset COO Steve Newcomb, it includes provisions that prevent any other company – such as Google – from getting access to the technology even if the other company acquires Xerox or PARC. That should be enough to give Google pause. But is the technology really enough to make the search giant start shaking in its boots?
In the welcome email I received when I signed up for Powerset’s PowerLabs, I saw a link to a short video about Powerset. For those who like this kind of irony, it’s hosted on YouTube, which is owned by Google. The one-minute video consisted of product manager Mark Johnson explaining how members of PowerLabs will get to “brainstorm ideas, write requirements, and test out the product…You’ll be able to run searches on the Powerset engine and see what our cool capabilities are, and you’ll also be able to give feedback on the results which will help to train Powerset and change the way the results come back in the future…”
My chief problem with the video was that it consisted of a talking head. Why did Johnson not see fit to include a demonstration of the technology? In the Powerset blog, there are several entries that focus on how it returns results that are very different from what Google returns. Some entries even talk about why natural language is so difficult for computers to comprehend (kudos to Marti Hearst, a Powerset consultant and a professor at the Berkeley School of Information, for writing such engaging posts). So why not bring some of that out in the video?
I’ll have to assume that it’s little more than a teaser. Powerset has given demos of its technology; at least one observer has commented on the fact that these demos are always powered by someone at the company, and never seem to accept outside suggestions. Still, they have returned decent results. For example, a search on “who won an academy award in 2001?” returns Halle Berry, with a photo, a list of films, awards, and a description.
Powerset and others have made much of the point that Google doesn’t return as good a result for this kind of query. Or does it? I tried the query, without quotes, in Google. I found this link on the first page of Google’s results. It’s actually better than the result that Powerset returned if I want to know all of the Academy Award winners for 2001 – which would make sense given the nature of the question. And here we actually find a disagreement – Julia Roberts supposedly took the Best Actress title for Erin Brockovich. I ended up going to the actual Academy Awards web site to clear up the discrepancy; Julia Roberts received her Oscar in 2001 for her work in 2000. Likewise Berry received hers in 2002 for Monster’s Ball, released in 2001. Even the best technology can’t read your mind.
One thing I can say for certain: Powerset isn’t afraid of a challenge. They’re running the site on Ruby on Rails. It’s a nimble framework; we’ve devoted a whole category to it on Dev Shed, as a matter of fact. But no one seems to know whether it can handle the kind of traffic that a popular search engine will inevitably attract.
Powerset is a small company given what it is setting out to do. One recent article mentioned that it boasts 66 engineers. About 10 of these use Ruby on a daily basis, according to Powerset project leader Kevin Clark, so the decision to use Ruby made good sense. Also, the entire organization uses Ruby internally. Clark notes that “a substantial part of our infrastructure is being written in Ruby or being accessed through Ruby services. Our scientists use Ruby to interact with our core language technology…Frankly, we as an organization use Ruby a whole heck of a lot.”
As to the scaling issues, Clark is not worried. While Twitter has been held up as an example of a company who’s Ruby on Rails technology did not scale well, Clark actually talked with Twitter’s lead developer Blaine Cook to find out where Twitter’s problems came from. He discovered that Twitter ran into architectural problems that had nothing to do with RoR. In fact, according to Cook, Ruby on Rails quickly became part of the solution: “thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is now 10000% faster than it was in January.”
Unfortunately, we won’t really know how well it all works until September at the earliest, when private beta testers get to play with the technology. The rest of the world won’t get to look at it until the end of this year. I’m not going to bet that Powerset is the next Google killer, but I’m glad to see someone taking a different approach to the challenge of search.