Canadian organizations are increasingly looking at the open source Hadoop platform for processing large data sets across clusters of commodity servers.
For analyzing big data with no pressing time demands for results it offers a number of advantages. But what if you want to take advantage of the power to run real-time transactions?
A San Francisco company called Splice Machine Inc. says it has the answer: An online transactional SQL relational database management system (RDBMS) made for Hadoop that replaces the Oracle, MySQL or any other database most administrators use today.
A public beta version was launched in May, and company founder and CEO Monte Zweben said in an interview that the commercial version will be out “this summer.”
Traditional databases linking to Hadoop may meet most enterprise needs. But, Zweben said, some databases have hit a wall in terms of performance or scalability due to limitations of not being designed to run within Hadoop and its HDFS distributed file system.
Splice Machine is built on two open source stacks, he said. “One is a relational database called Apache Derby [an open source SQL DB based on Java]. We took this centralized file access system and gutted its storage layer and insert Apache H-Base, which is the real-time value store that sits on top of the Hadoop stack. And this is what gives us the real-time scalable functionality.”
For those with a technical bent, when a SQL statement is entered through Splice Machine the Derby layer parses it and creates a query plan that takes advantage of Hadoop’s parallel processing and is sent to each H-Base node, which executes the plan close to where the data is stored. The results are then spliced back together again.
Matt Aslett, an industry analyst at 451 Group based in England, said the solution “potentially” could meet the needs of certain organizations, but it’s still early days. What the company has to show is that it works by producing customer success stories that show the benefits of its approach.
“It’s clearly a problem most organizations right now (running Hadoop) don’t have,” he said in an interview. But he added that he has talked to companies making a big investment in HDFS and distributed architecture who are thinking about their next generation database management platform, and an operational database will be a part of that.
“So there’s an argument if HDFS is a primary data layer, it makes sense to not dump data there from existing applications but start to think building operational applications that ingest data into that environment,” he said.
Splice Machine isn’t alone in looking at the putting a transactional database on Hadoop. Last month Hewlett-Packard announced what it calls the Trafodion project, what it calls an enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads.
In some ways HP’s move validates Splice Machine’s work, Aslett said.
Splice Machine is free for those using it for testing. Put it into operation and the company charges about US$5,000 per server.