Some Google observers are concerned that a new privacy policy announced by the Web search giant may contain holes that could make it possible to connect search logs to the names of users, potentially defeating the purpose of Google’s plan to make records about user searches anonymous after 18 to 24 months.
Google will alter cookie information and change the last eight bits of the 32-bit IP addresses that identify computers logged onto the company’s search engine, under a policy announced last week. This means there is only a “partial de-identification” of users, says Pam Dixon, founder and executive director of the nonprofit World Privacy Forum .
“If there was a data breach and it all got out, you wouldn’t get the entire IP address. That’s a step,” she says. “But if you were involved in a legal process and wanted to re-identify the data, it can be done. … This is not a cloak of privacy that has been put over user searches.”
According to a statement released by Google Tuesday, someone with access to an IP address in which the last eight bits are obscured could narrow the address’s location down to a group of 256 computers, but would not be able to figure out which of those computers the IP address belongs to.
Privacy advocates have focused on Google and other search engines because the phrases people search for provide insight into their personal histories, including diseases they might have. Google says it keeps search logs to analyze usage patterns and diagnose system problems. Privacy advocates worry that keeping archived records of searches in storage for extended periods of time opens the door for law enforcement agencies to demand information that could identify users.
A second concern about Google’s new policy was raised in a blog posting by Forrester Research security analyst Jen Albornoz Mulligan. If an anonymous IP address is always connected to the same user computer, the user could be identified because people tend to search for their own names on Google, she argues. This was the strategy AOL was using to anonymize IP addresses last year when the company accidentally released a database that contained search histories of more than 650,000 AOL users, Mulligan says.
Google has not yet decided exactly how it will go about changing the last eight bits of IP addresses. “We’re still developing the precise technical methods and approach to this,” the company states on its Web site.
Mulligan notes in her blog that Google officials “make no mention of preventing a similar AOL disclosure snafu by ensuring that individual searches by the same person are anonymized in different ways.”
Chances are, though, that Google will not make the same mistake as AOL, Mulligan says.
“I would certainly guess they are smart enough to not do this,” Mulligan said in a phone interview. “I would have imagined they would learn the lessons from AOL.”
Google’s policy is designed in part to comply with a European Union directive requiring data retention for between six months and two years. The exact length requirements will be determined individually be each European country.
Dixon argues that Google should keep IP address information intact for no longer than 6 months in the United States, where regulations regarding data retention are less stringent. Google says it wants one policy to apply to all of its operations worldwide. The company will not keep IP addresses and cookies intact after 24 months “unless we’re legally required to retain log data for longer,” it said in last week’s announcement. The policy also applies retroactively to searches performed before the policy was announced.
“We think we’re striking the right balance between two goals: continuing to improve Google’s services for you, while providing more transparency and certainty about our retention practices,” Google wrote in last week’s statement.
Overall, Google’s new policy is a “very good first step,” says Ari Schwartz, deputy director of the Center for Democracy and Technology.
“It’s a pretty big move because it had been their contention that they could hold this information forever and they were going to use it for research purposes without specifying what those purposes would be,” he says.
AOL obscures user search information after 13 months, he says.
Privacy policies published by Yahoo and Microsoft do not say how long the companies store users’ IP addresses.
In response to a Network World inquiry Tuesday, Yahoo released a statement saying, “Like most other Internet companies, we keep data for as long as it is useful. There are many different uses such as improving our search results, fraud mitigation, law enforcement compliance, research, product development, and directing sponsored search advertisements for relevant queries. … Protecting our users’ privacy and maintaining their trust is paramount to us.”
Schwartz says Google should allow users greater control over which information is stored and which is not. For example, Google’s Gmail privacy notice says e-mail deleted by a user may be stored in Google’s active servers for up to 60 days and may remain indefinitely in offline backup systems that Google has in case of an emergency.
In comparison, Microsoft says it gets rid of deleted e-mails within three days and Yahoo says it does so immediately, according to Schwartz.
“It seems excessively long. If someone says ‘I want it deleted now … (60 days) does not really meet their expectations,” he says.
Schwartz says he views last week’s announcement as a first step toward a larger privacy policy covering all of Google’s services.
“They’ve had this goal of collecting all the world’s information. And they have another goal of ‘don’t do evil’ and ‘we care about privacy,’ basically,” Schwartz says. “Sometimes, those two things run head to head against each other.”