[IP] Data mining of Amazon wishlists
Begin forwarded message:
From: Tony Wasserman <tonyw@xxxxxxx>
Date: January 8, 2006 2:20:22 PM EST
To: dave@xxxxxxxxxx
Subject: Data mining of Amazon wishlists
For IP, if you like.
Do you have the Quran (Koran) on your Amazon wish list? How about
something by Michael Moore?
Slashdot (www.slashdot.com) has a pointer to an article on how one could
find people with "subversive" Amazon wish lists.  The article, by Tom  
Owad,
can be found at http://www.applefritter.com/bannedbooks, and gives a
straightforward means to find such people from publicly available  
data using
scripts.  Owad presents this article as the first in a planned weekly  
series
"that will deal with security on the internet and practical steps you  
can take
to protect your privacy".  Readers can register and post comments on the
site and/or contact him directly by email (his last name at  
applefritter.com)
I've included the introductory paragraphs of the article below.  The  
remainder
of the article includes a more detailed discussion, including scripts  
showing
how to locate those individuals who have specific titles on their  
wish lists.
Tony Wasserman
Data Mining 101: Finding Subversives with Amazon WishlistsSubmitted  
by Tom Owad on January 4, 2006 - 7:37pm.
Vast deposits of personal information sit in databases across the  
internet. Terms used in phone conversations have become the grounds  
for federal investigation. Reputable organizations like the Catholic  
Worker, Greenpeace, and the Vegan Community Project, have come under  
scrutiny by FBI "counterterrorism" agents.
"Data mining" of all that information and communication is at the  
heart of the furor over the recent disclosure of government snooping.  
"U.S. President George W. Bush and his aides have said his executive  
order allowing eavesdropping without warrants was limited to  
monitoring international phone and e-mail communications linked to  
people with connections to al-Qaeda. What has not been acknowledged,  
according to the Times, is that NSA technicians combed large amounts  
of phone and Internet traffic seeking patterns pointing to terrorism  
suspects.
"Some officials described the program as a large data mining  
operation, the Times said, and described it as much larger than the  
White House has acknowledged." (Reuters)
Combining a data mining operation with the Patriot Act's power to  
access information makes it all too easy for the federal government  
to violate the Constitution's prohibition against unreasonable  
search. Ars Technica has an article, The new technology at the root  
of the NSA wiretap scandal, that describes the ease with which  
widespread wiretapping can now be implemented. It quotes Philip  
Zimmermann, the creator of the PGP encryption software:
"A year after the CALEA [Communications Assistance for Law  
Enforcement Act] passed [in 1994], the FBI disclosed plans to require  
the phone companies to build into their infrastructure the capacity  
to simultaneously wiretap 1 percent of all phone calls in all major  
U.S. cities. This would represent more than a thousandfold increase  
over previous levels in the number of phones that could be  
wiretapped. In previous years, there were only about a thousand court- 
ordered wiretaps in the United States per year, at the federal,  
state, and local levels combined. It's hard to see how the government  
could even employ enough judges to sign enough wiretap orders to  
wiretap 1 percent of all our phone calls, much less hire enough  
federal agents to sit and listen to all that traffic in real time.  
The only plausible way of processing that amount of traffic is a  
massive Orwellian application of automated voice recognition  
technology to sift through it all, searching for interesting keywords  
or searching for a particular speaker's voice. If the government  
doesn't find the target in the first 1 percent sample, the wiretaps  
can be shifted over to a different 1 percent until the target is  
found, or until everyone's phone line has been checked for subversive  
traffic. The FBI said they need this capacity to plan for the future.  
This plan sparked such outrage that it was defeated in Congress. But  
the mere fact that the FBI even asked for these broad powers is  
revealing of their agenda."
It used to be you had to get a warrant to monitor a person or a group  
of people. Today, it is increasingly easy to monitor ideas. And then  
track them back to people. Most of us don't have access to the  
databases, software, or computing power of the NSA, FBI, and other  
government agencies. But an individual with access to the internet  
can still develop a fairly sophisticated profile of hundreds of  
thousands of U.S. citizens using free and publicly available  
resources. Here's an example.
There are many websites and databases that could be used for this  
project, but few things tell you as much about a person as the books  
he chooses to read. Isn't that why the Patriot Act specifically  
requires libraries to release information on who's reading what? For  
this reason, I chose to focus on the information contained in the  
popular Amazon wishlists.
Amazon wishlists lets anyone bookmark books for later purchase. By  
default these lists are public and available to anybody who searches  
by name. If the wishlist creator specifies a shipping address,  
someone else can even purchase the book on Amazon and have it shipped  
directly as a gift. The wishlist creator's city and state are made  
public on the wishlist, but the street address remains private.  
Amazon's popularity has created a vast database of wishlists. No  
index of all wishlists is available, but it remains possible to view  
all wishlists by people of a particular first name. A recent search  
for people named Mark returned 124,887 publicly viewable wishlists.
For an all inclusive search by name, you could compile a  
comprehensive list of first names and nicknames from the baby names  
databases available on the internet. Armed with this list, and by  
recording the search results for each first name, it is possible for  
you to retrieve the vast majority of public wishlists on Amazon.
For the purposes of this exercise, only a single name was chosen – a  
common male name that returned over 260,000 wishlists. I'm not going  
to divulge what name was actually used. Let's pretend it was "Edgar,"  
in honor of former FBI director J. Edgar Hoover.
Before writing a script to download all the 260,000 "Edgar"  
wishlists, I confirmed that my actions would not violate Amazon's  
Conditions of Use. I also checked the robots.txt file which contains  
a list of directories Amazon requests not be traversed by scripts.  
User wishlists are not in this list, nor did the actions to be taken  
violate the conditions of use.  [more at www.applefritter.com/ 
bannedbooks]
-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip
Archives at: http://www.interesting-people.org/archives/interesting-people/