Serving as a Bad Example: AOL Privacy Debacle
(Page 1 of 4 )
AOL’s release in late July of data covering three months’ worth of user searches has sparked a wide range of reactions. In some ways, the reactions have revealed far more than the raw data itself. They have pointed out the ways in which the current handling of search engine data does not serve all those who wish to use it – and the delicate balancing act that must be maintained between equally commendable yet mutually exclusive needs.
As always, let’s start with the facts. In late July, AOL posted data to an AOL research site. This data covered searches conducted in March, April, and May of 2006. It covered 20 million uncensored queries from about 658,000 users, or the equivalent of between one and two percent of the searches conducted through AOL during May. The users were chosen randomly and rendered anonymous by the simple matter of associating an ID number with the searches instead of a name.
If you think that’s not enough to keep a searcher’s identity secret, you’re right, as you’ll see in a minute. For now, the point I want you to understand is that, at the time, AOL did this intentionally, apparently to get recognition from the research community by putting up a data set that can be regularly cited in research papers. It was not an accident.
Here my sources stop agreeing with each other. Media Post Publications says that the queries were on a publicly viewable web site for about two weeks before bloggers noticed them during the weekend of August 6-7, which led to their removal. Another source says they were up for only one week. Yet a third source claims the material was available from AOL for only three hours on August 4. Whatever the truth might be, the result was the same: data that’s been released onto the Internet cannot be easily recalled.
By the time AOL yanked the data, the damage had already been done. A number of sites had downloaded the file and put a friendly interface on it to make it easily searchable. It didn’t matter that it was huge: 436 MB, or 2 GB unzipped. It was now out in the wild, causing the inevitable ripples.
Next: Reaction at AOL >>
More Search Engine News Articles
More By Terri Wells