There are a number of technologies infosec pros have at hand for protecting sensitive data, but the most common is encryption. However, scrambling letters and numbers gets in the way of application developers, who often need real data — or a close approximation of it — to test their work.
The problem is handing real data to developers or database analysts may violate regulatory and privacy requirements if they don’t have security clearance as it adds to the number of people in the enterprise able to access databases of information that could be tampered with or copied.
“It’s very dangerous,” Joseph Feiman, a Gartner research vice-president said in a recent interview. From the moment real data is given to a developer “you should not worry about hackers from China or Russia,” he said: The security threat is from the inside of the organization.
In the last eight years data masking has emerged as the technology smart enterprises use to minimize the risk. In fact Gartner calls it a must-have tool. Also called data obfuscation, it protects sensitive information by masking data fields in a consistent way that makes the data almost real — for example, changing specific ages and salaries to general ranges, or modifying social insurance numbers — yet keeps the information usable for testers and data analysts.
When it’s done for non-production environments it’s called static masking. More recently the technology has enabled dynamic data masking, which masks data on the fly, so certain staff — for example, call centre workers — can access sensitive data in real time.
Feiman, co-author of a recent report on data masking, said Gartner estimates only 25 per cent of its 12,000 clients around the world use the technology. However, he expects it will only increase. Overall revenue for static data masking sales last year will be about US$300 million, Gartner estimates — a 60 per cent leap over 2013.
There’s no shortage of vendors offering the technology: At least 15 by Gartner’s count, either in standalone products or as part of dev-test suites. IBM, Oracle and Informatica have three-quarters of the market by Gartner’s count. Others include Voltage Security (now part of Hewlett-Packard), Compuware, Axis Technology, Dataguise, GreenSQL, Grid-Tools, Mentis, Net 2000 and Solix Technologies.
Two Canadian companies are also providers: Camouflage Software of St. John’s, Nfld., a pioneer in the field; and Ottawa’s Privacy Analytics.
Camouflage’s enterprise data masking solution can be bought individually or as a suite of six modules that includes data discovery and application templates for Oracle’s E-Business suite, PeopleSoft and Siebel.
The latest release of Enterprise (4.3.4) also has a high-performance option the company says dramatically expedites relational data masking projects, achieving performance gains of at least 15 times faster than previous versions. It also supports Hadoop and Cloudera big data platforms.
In an interview company founder and CEO Kevin Duggan admitted having to face IBM and Informatica is “the biggest challenge that we have,” because those companies can sell to existing customers.
Camouflage has annual revenues of $3 million, but Duggan said that includes some Fortune 20 companies.
He said his firm is close to signing an agreement for private investment which will allow Camouflage to expand its sales and marketing efforts. The plan is to expand the existing staff to 30 this year from the existing 20, and up it to 45 next year.
The products are largely sold direct, although it also works through KPMG and PricewaterhouseCoopers.
Privacy Analytics focuses on the healthcare market through its Parat software. Version 6.0, released in November, includes what the company says is an interactive de-identification pipeline designer with drag and drop capabilities to describe the flow of data from start to finish in a de-identification process, and integrated relational dataset modeling and classification that ensures dataset integrity by remembering the relationships between tables and identifier types throughout the process.
IBM includes static and dynamic data masking in InfoSphere Optim Data Privacy, while its dynamic data masking capabilities are added to the InfoSphere Guardium Data Activity Monitor appliance.
In an email Sonia Daigle, a business unit executive for IBM Security Systems software, said masking in Optim either works automatically or customers can use in their applications to do the masking themselves.
User Defined Functions (UDF) based on the masking algorithms are also available so a customer can call them via SQL to perform masking. These UDF exist for a number of databases including DB2 for Linux/Unix/Windows.
There are any number of features that suppliers offer. Axis, for example, includes tokenization for the reversibility of masking, if it is required. Camouflage’s dynamic data masking covers relational database management systems and XML messaging systems. It is one of several (including Oracle and Solix) that also offers data subsetting, masking small subsets of a database. Dataguise supports big data platforms including MapR, Cloudera, Hortonworks, Pivotal Greenplum Database, IBM InfoSphere BigInsights, Amazon Elastic MapReduce (EMR), Pivotal HD and Apache Hadoop distributions.
In choosing a masking solution Feiman said to look for features that ensure database integrity (that is, the masked data won’t break the test database, and that data is masked consistently across several databases); the ability to discover data across the enterprise; that it runs across several platforms and it has a dashboard for managing masking across several databases.
Watch for the ability to mask across big data platforms, he added.
Other capabilities users may want include extract/transformation/load, data archiving,