A data breach of 48 million personal records first revealed to a House of Commons committee was caused by an error made by business data search service LocalBlox, according to cyber security vendor UpGuard.
A misconfigured Amazon Web Services (AWS) S3 bucket was discovered by Chris Vickery, a member of the UpGuard Cyber Risk Team on Feb. 18. As has been the case in many misconfigured S3 buckets over the past 18 months, the data was externally exposed to the public Internet.
Upon examining the contents, UpGuard discovered 48 million personal records that were assembled by LocalBlox. At least in part, the data was assembled by scraping public-facing websites of social networks including Facebook, LinkedIn, Twitter, and real estate site Zillow.
On Wednesday morning, UpGuard published an article explaining the breach in fuller detail. Facebook was a victim in the case of this breach, along with its users, as data is scraped for purposes that no one ever agreed to. Facebook recently took steps to restrict the practice of data scraping from its profile pages. On Apr. 4 Facebook announced it would no longer allow account searches to be conducted using a phone number or email address.
LocalBlox scraped the Facebook portion of the data using HTML, not its API, says Dan O’Sullivan, report author and cyber risk analyst at UpGuard. He spoke with IT World Canada on the phone. Data scraping is a very common practice and LocalBlox wasn’t trying to hide its techniques.
“They are advertising on the basis of this. That they scrape these social media accounts to give you, the paying customer, the best insights into user data,” he says.
On Tuesday, Vickery was a guest of the House of Commons Standing Committee of Access to Information, Privacy and Ethics. There, he made reference to the breach of 48 million records that included Facebook data. He was responding to a question about how detailed a data breach involving Facebook data could get. He indicated that it was possible that personal messages were involved in the breach.
That’s not the case, O’Sullivan says. Vickery was merely referring to social media posts that could have been scraped.
Also published Wednesday morning was a story by ZD Net reporter Zack Whittaker, who was working with Vickery. According to the story, Vickery disclosed the breach to LocalBlox and it was secured hours later.
In an interview with LocalBlox chief technology officer Ashfaq Rahman tells ZD Net that most of the 48 million records were just made up for internal testing. He said that no other individual besides Vickery is believed to have accessed the S3 bucket.
The websites affected by the data scraping all say that the practice violates their terms of service.
Securing S3 buckets
Even though organizations from Verizon to the Pentagon have been caught with S3-related data breaches, securing Amazon’s storage service should be simple enough. A S3Â bucket comes password-protected by default and an administrator must configure it to be externally accessible. Amazon also recently added an orange warning indicator to its dashboard for any S3 bucket that is made public.
“If you wade through our archives of previous breach reports, most of them are S3 exposures,” O’Sullivan says. “I don’t say that to pick on Amazon because the default setting on an S3 bucket is secured and password protected.”
By its nature of being easy to set up, S3 buckets are broadly accessible and may sometimes be in the hands of administrators that don’t appreciate the finer points of user privacy and security, he says.
AWS customers can also use the free Trusted Advisor feature to check S3 bucket permissions, or use AWS CloudTrail to monitor account activity and actions taken on their infrastructure.