Many IT security experts say encrypting data is the only way enterprises can be assured data is protected on premise or in the cloud.
But an academic paper says that under certain circumstances HTTPS traffic itself can be analyzed to deduce a person’s medical condition and sexual orientation by revealing what Web pages have been visited.
The technique developed by researchers at the University of California at Berkley and Intel Labs isn’t perfect – individual Web pages in the same Web site can be identified with 89 per cent accuracy, but variations can be as large as 18 per cent due to assumptions affecting caching and cookies.
But their point is that “HTTPS is far more vulnerable to traffic analysis than has been previously discussed by researchers.”
According to ComputerWorld U.S. the report is to be presented at a July 16 privacy conference in Amsterdam.
The researchers captured 463,125 page loads from a number of U.S. healthcare, finance, legal services and streaming video sites including the May Clinic, Planned Parenthood, the Bank of America and Netflix.
Briefly, they used clustering techniques to identify patterns in the traffic, then other analytic techniques to identify pages with some degree of accuracy.
Obviously, if it can be determined that a particular person goes to a healthcare site regularly for information on a chronic disease, or a legal site for bankruptcy information highly personal information might be deduced — assuming, of course, the viewer isn’t accessing the page on behalf of a relative or friend.
To accomplish this type of analysis an attacker has to be able to go to the same Web sites as the target and observe the person’s traffic to be able to match patterns. The researchers also note that governments, Internet service providers and employers are among the groups that could leverage traffic monitoring.
The researchers’ technique isn’t simple (think of using burst pair clustering of packets, Hidden Markov Models and Bag of Gaussians distribution). But it does suggest there is the possibility of gleaning some information, although how useful is a question.
And it does make a crucial assumption — the victim browses the Web in a single tab and successive page loads can be easily delineated.
Finally, if the attack technique is valid at a reasonable cost there are defences, the researchers point out, such as padding packet sizes, that can drastically reduce accuracy. And without accuracy the attack is useless.
But the point is that like all technologies, encryption may have weaknesses that can be exploited by unscrupulous people. This is worth more investigation.