- by now many of you may have read about AOL's latest debacle: the release of 650,000 of its users' personal information embedded in nearly 2GB of Web-search data... the data represents several months of searches from this year, and while not containing specific usernames (which have been changed to numerical IDs), still contains enough pertinent info to pinpoint real user names, Social Security numbers, and so on...
- the data is available through a number of outlets (torrents, etc.) and seems to have replicated rapidly... i assume AOL techs are now familiar with the phrase 'Pandora's box'?
- the data is in ASCII, tab-delimited format containing the following fields (from the *README.txt):
AnonID - an anonymous user ID number.
Query - the query issued by the user, case shifted with most punctuation removed.
QueryTime - the time at which the query was submitted for search.
ItemRank - if the user clicked on a search result, the rank of the item on which they clicked is listed.
ClickURL - if the user clicked on a search result, the domain portion of the URL in the clicked result is listed.
- i thought i'd do a quick run-through on how many searches were conducted relating to BF:
zgrep 'bikeforums.net' *gz | wc -l
results in 280 references (in less than 100 seconds on my iBook)
- on the other hand, there were 2,914 refs to 'linux'...