[IP] Data mining that protects privacy?
Begin forwarded message:
From: Randall <rvh40@xxxxxxxxxxxxx>
Date: June 18, 2006 12:36:21 PM EDT
To: Dave Farber <farber@xxxxxxxxxxxxx>, Dewayne Hendricks
<dewayne@xxxxxxxxxxxxx>
Subject: Data mining that protects privacy?
http://htdaw.blogsource.com/post.mhtml?post_id=349601
Sunday, June 18, 2006 at 12:34 PM EDT
Research explores data mining, privacy
By BRIAN BERGSTEINSat Jun 17, 10:50 PM ET
As new disclosures mount about government surveillance programs,
computer science researchers hope to wade into the fray by enabling data
mining that also protects individual privacy.
Largely by employing the head-spinning principles of cryptography, the
researchers say they can ensure that law enforcement, intelligence
agencies and private companies can sift through huge databases without
seeing names and identifying details in the records.
For example, manifests of airplane passengers could be compared with
terrorist watch lists — without airline staff or government agents
seeing the actual names on the other side's list. Only if a match were
made would a computer alert each side to uncloak the record and probe
further.
"If it's possible to anonymize data and produce ... the same results as
clear text, why not?" John Bliss, a privacy lawyer in IBM Corp.'s
"entity analytics" unit, told a recent workshop on the subject at
Harvard University.
The concept of encrypting or hiding identifying details in sensitive
databases is not new. Exploration has gone on for years, and researchers
say some government agencies already deploy such technologies — though
protecting classified information rather than individual privacy is a
main goal.
Even the data-mining project that perhaps drew more scorn than any other
in recent years, the Pentagon's Total Information Awareness research
program, funded at least two efforts to anonymize database scans. Those
anonymizing systems were dropped when Congress shuttered TIA, even while
the data-mining aspects of the project lived on in intelligence
agencies.
Still, anonymizing technologies have been endorsed repeatedly by panels
appointed to examine the implications of data mining. And intriguing
progress appears to have been made at designing information-retrieval
systems with record anonymization, user audit logs — which can confirm
that no one looked at records beyond the approved scope of an
investigation — and other privacy mechanisms "baked in."
The trick is to do more than simply strip names from records. Latanya
Sweeney of Carnegie Mellon University — a leading privacy technologist
who once had a project funded under TIA — has shown that 87 percent of
Americans could be identified by records listing solely their birthdate,
gender and ZIP code.
Sweeney had this challenge in mind as she developed a way for the U.S.
Department of Housing and Urban Development to anonymously track the
homeless.
The system became necessary to meet the conflicting demands of two laws
— one that requires homeless shelters to tally the people they take in,
and another that prohibits victims of domestic violence from being
identified by agencies that help them.
Sweeney's solution deploys a "hash function," which cryptographically
converts information to a random-appearing code of numbers and letters.
The function can't be reversed to reveal the original data.
When homeless shelters had to submit their records to regional HUD
offices for counting how many people used the facilities, each shelter
would send only hashed data.
A key detail here is that each homeless shelter would have its own
computational process, known as an algorithm, for hashing data. That
way, one person's name wouldn't always translate into the same code — a
method that could be abused by a corrupt insider or savvy stalker who
gained access to the records.
However, if the same name generated different codes at different
shelters, it would be impossible to tell whether one person had been to
two centers and was being double-counted. So Sweeney's system adds a
second step: Each shelter's hashed records are sent to all other
facilities covered by the HUD regional office, then hashed again and
sent back to HUD as a new code.
It might be hard to wrap your mind around this, but it's a fact of the
cryptography involved: If one person had been to two different shelters
— and so their anonymized data got hashed twice, once by each of the
shelters applying its own formula — then the codes HUD received in this
second phase would indicate as much. That would aid an accurate count.
Even if HUD decides not to adopt the system, Sweeney hopes it finds use
in other settings, such as letting private companies and law enforcement
anonymously compare whether customer records and watch lists have names
in common.
A University of California, Los Angeles professor, Rafail Ostrovsky,
said the CIA and the National Security Agency are evaluating a program
of his that would let intelligence analysts search huge batches of
intercepted communications for keywords and other criteria, while
discarding messages that don't apply.
Ostrovsky and co-creator William Skeith believe the system would keep
innocent files away from snoops' eyes while also extending their reach:
Because the program would encrypt its search terms and the results, it
could be placed on machines all over the Internet, not just computers in
classified settings.
"Technologically it is possible" to bolster security and privacy,
Ostrovsky said. "You can kind of have your cake and eat it too."
That may be the case, but creating such technologies is just part of the
battle.
One problem is getting potential users to change how they deal with
information.
Rebecca Wright, a Stevens Institute of Technology professor who is part
of a five-year National Science Foundation-funded effort to build
privacy protections into data-mining systems, illustrates that issue
with the following example.
The Computing Research Association annually analyzes the pay earned by
university computer faculty. Some schools provide anonymous lists of
salaries; more protective ones send just their minimum, maximum and
average pay.
Researchers affiliated with Wright's project, known as Portia, offered a
way to calculate the figures with better accuracy and privacy. Instead
of having universities send their salary figures for the computer
association to crunch, Portia's system can perform calculations on data
without ever storing it in unencrypted fashion. With such secrecy, the
researchers argued, every school could safely send full salary lists.
But the software remains unadopted. One large reason, Wright said, was
that universities questioned whether encryption gave them legal standing
to provide full salary lists when they previously could not — even
though the new lists never would leave the university in unencrypted
form.
Even if data-miners were eager to adopt privacy enhancements, Wright and
other researchers worry that the programs' obscure details might be
difficult for the public to trust.
Steven Aftergood, who heads the Federation of American Scientists'
project on government secrecy, suggested that public confidence could be
raised by subjecting government data-mining projects to external privacy
reviews.
But that seems somewhat unrealistic, he said, given that intelligence
agencies have been slow to share surveillance details with Congress even
on a classified basis.
"That part of the problem may be harder to solve than the technical
part," Aftergood said. "And in turn, that may mean that the problem may
not have a solution."
___
On the Net:
Portia: http://crypto.stanford.edu/portia
Sweeney: http://lab.privacy.cs.cmu.edu/people
http://news.yahoo.com/s/ap/data_mining_privacy;_ylt=AsYnPYpeRWoPlHcna.
3LYyJhr7sF;_ylu=X3oDMTBhcmljNmVhBHNlYwNtcm5ld3M-
--
My Original Writing blog - http://itgotworse.blogsource.com
-------------------------------------
You are subscribed as roessler@xxxxxxxxxxxxxxxxxx
To manage your subscription, go to
http://v2.listbox.com/member/?listname=ip
Archives at: http://www.interesting-people.org/archives/interesting-people/