-------- Original Message --------
Subject: Google and Data Retention - Policies and Possibilities
Date: Tue, 31 Jan 2006 09:08:22 -0800
From: Lauren Weinstein <lauren@xxxxxxxxxx>
To: dave@xxxxxxxxxx
CC: lauren@xxxxxxxxxx
Dave,
That Google can track user searches is hardly an "alert the media"
revelation. This status was effectively obvious since we know that
Google responds affirmatively to various law enforcement-related
data-retrieval orders (and quite possibly to others that we don't
know about, such as national security letters), that would be largely
useless without such data -- and Google has never claimed to operate
anonymously in this respect.
A more interesting question in terms of data retention is *how long*
various aspects of the data are retained. That is, does this fine
grain of data "expire" over time, or is retrospective data mining of
the detailed data possible back into the indefinite past?
This issue is rapidly moving into the spotlight, as Congress appears
poised to discuss laws that would *mandate* data retention rules
for search engines and perhaps other Internet services -- and we all
know that when Congress gets involved in technical matters, the
results
are often -- shall we say -- less "optimal" than if industry had
addressed these concerns themselves voluntarily.
Obviously there are certain enhanced Google services (mostly related
to logged-in users in the search and Gmail spaces, including but not
limited to users availing themselves of Google's search history
features) that require long-term detailed data to function.
But viewed from the outside, there are steps that Google could take
to minimize privacy-related risks while not significantly
interfering with the value of that data for ongoing R&D and
innovation. This is only a thumbnail conceptual description of
course, based on external observations alone.
1) Minimize the length of time that full log records are maintained
for users not using enhanced services. For instance, full
records might be maintained for 30 days (an arbitrary figure for
this example). These would be available to law enforcement
queries and the like for ongoing investigations. However, after
the expiration period, records would be anonymized (stripped of
IP, cookie, and other connection-related data identifying the
user). Logged search query strings (though they also can
contain personal information, as we know) would not be affected
at this stage and would continue to be available for R&D and
other purposes, but now with a significantly lower outside
abuse potential.
2) After some longer period of time (a year? -- again, an arbitrary
period for the sake of this example) the remaining portion of
the records for non-enhanced service users would be purged
(deleted). I of course cannot address the non-trivial issues of
system and related data backups in this regard, since I have no
idea how Google has structured backup activities across their
enterprise, but this aspect in particular might make for an
interesting technical challenge.
3) Users of Google's enhanced search-history-based services, etc.
represent another interesting problem, since detailed data must
be maintained for these users in some form for the services to
function. However, it seems likely that the outside abuse
potential of this detailed data could be greatly reduced
through various cryptographic techniques, while still permitting
the required functionalities. It should be noted that
cryptographic methods may also be applicable in various ways to
alternative solutions for the issues described in sections (1)
and (2) above.
Since I am not privy to Google's internal topology, the above ideas
can
quite reasonably be categorized as speculative. However, the point
is that there do exist a range of technological approaches to dealing
with this data that could be harnessed to strike a reasonable balance
between data usefulness and privacy-related concerns -- permitting
R&D and innovation to proceed while minimizing the inherent abuse
potential in sensitive data of this sort.
--Lauren--
Lauren Weinstein
lauren@xxxxxxxxxx or lauren@xxxxxxxx
Tel: +1 (818) 225-2800
http://www.pfir.org/lauren
Co-Founder, PFIR
- People For Internet Responsibility - http://www.pfir.org
Co-Founder, IOIC
- International Open Internet Coalition - http://www.ioic.net
Moderator, PRIVACY Forum - http://www.vortex.com
Member, ACM Committee on Computers and Public Policy
Lauren's Blog: http://lauren.vortex.com
DayThink: http://daythink.vortex.com
- - - - - - -
Begin forwarded message:
From: Adam Fields <ip20398470293845@xxxxxxxxxx>
Date: January 30, 2006 10:05:48 PM EST
To: dave@xxxxxxxxxx
Subject: More detailed queries of what Google stores
I asked two very specific questions in a conversation with John
Battelle, and he's received unequivocal answers from Google:
1) "Given a list of search terms, can Google produce a list of people
who searched for that term, identified by IP address and/or
Google
cookie value?"
2) "Given an IP address or Google cookie value, can Google produce a
list of the terms searched by the user of that IP address or
cookie
value?"
The answer to both of them is "yes".
http://battellemedia.com/archives/002283.php
--
- Adam
-------------------------------------
You are subscribed as malin@xxxxxxxxxx
To manage your subscription, go to
http://v2.listbox.com/member/?listname=ip
Archives at: http://www.interesting-people.org/archives/interesting-
people/