The open source Apache Hadoop platform for big data management has had a meteoric rise since its release in 2006 with an advanced distributed file system, but an industry consultant warns its security protection is the equivalent to software released in 1993.
Kevvie Fowler, a risk consulting partner at KPMG Canada, compared the security in Hadoop to Windows for Workgroups 3.11to an audience at the SecTor security conference in Toronto on Wednesday.
“There’s not a lot of security in the operating system for Windows for Workgroups … and that’s similar to Apache Hadoop.”
And given that Hadoop clusters can hold huge amounts of data, he said the risks are significant.
In fact Fowler was baffled why organizations put up with software that’s so unprotected. But he suggested it follows a pattern.
“Business, to try to improve itself, in a lot of cases trumps security. It’s not the correct approach but if you look at it what business did is took a technology that had no business being in an enterprise and said, ‘You know what? I’m going to become smarter and more agile and make better decisions about my business. I’m going to take this technology, this nuclear waste, and stick it in my organization because it’s going to help me in the immediate future.’ Not looking at the security ramifications.”
Often companies initiate a small big data project, and when it demonstrates business value it is expanded, Fowler said – and by that time it’s too late if there are security holes.
Security professionals need to alert management when projects are at an early stage, he said.
Apache Hadoop isn’t the only version of the platform. A number of software companies have taken it and added capabilities — Intel Hadoop, for example, comes with encryption built-in.
MORE FROM THE CONFERENCE
Are there limits to ethical hacking?
A video tour of the trade show floor
Meanwhile, Fowler offered eight steps to better secure Apache Hadoop custers:
–If you don’t need sensitive financial or personal information, don’t put it in Apache Hadoop. Once in, it’s hard to erase data in the clusters. Obfuscate sensitive data that has to go in – and before it goes in;
–Use a configuration management tool to deploy and manage nodes and clusters in a consistent way. If necessary there are free services like Puppet;
–Lock the front door. “It’s almost comical” that Hadoop doesn’t have default user authorization. Set that up before allowing any users to access data. He advises using Kerberos – it’s not easy but it offers secure authentication;
–Secure the underlying operating system by hardening servers and encrypt data at rest. If you don’t do this then when anyone logs into the system Apache Hadoop looks like a group of files.
–Use transmission-level security, otherwise data from Hadoop goes through your infrastructure in plain text;
–Have a choke point to stop intruders, such as a VPN to log and control users access before accessing the cluster;
–Secure Hadoop-related applications, such as Apache Hive for creating data warehouses and Apache HBase, a noSQL database. A lot of the SQL injection vulnerabilities in SQL databases are present in HiveQL, he said. And a number of other databases connect directly to Hadoop, he added, so attacks can be layered.
“You can spend all the time in the world securing your Hadoop without securing your applications and you’re going to have a huge disaster on your hands.”
Fowler also noted that latest versions of Hive (Sever 2) have the ability to revoke access to the warehouse, but any version of Hive server 1x only secures metadata and not the underlying data.
–Ensure your incident response and forensics program incorporates big data technology.