Users who eschew traditional relational databases in favor of the newly emerging NoSQL databases might be “throwing the baby out with the bath water,” warned a database pioneer before a roomful of NoSQL advocates.
Instead, SQL (Structured Query Language) can be adopted to newer systems with a few technical adjustments, giving it the full flexibility of NoSQL systems, argued Michael Stonebraker, the chief technology officer of distributed database software company VoltDB.
Stonebraker, making his argument at the NoSQL Now conference being held this week in San Jose, California, called this approach NewSQL.
While Stonebraker’s company itself offers NewSQL-based database software, his advocacy for this new architecture does carry more weight than the typical vendor pitch. Stonebraker was the chief architect for both the Ingres and Postgres databases, and has contributed to many others. He also co-founded column-oriented database company Vertica, which Hewlett-Packard purchased in February.
SQL-based relational database systems are indeed as moribund as NoSQL advocates charge, he argued. But this is the fault of the database vendors themselves, not SQL. Calling traditional relational systems “elephants,” he noted “Elephants are not slow because they support SQL.”
Most of the commercial relational database software packages have been on the market for 30 years or more, Stonebraker charged. They weren’t designed for today’s automated, data-heavy transactional environments. They’ve acquired decades worth of questionable new features, often referred to as bloat.
“Oracle doesn’t scale,” he said. “If you don’t need performance it doesn’t matter, but if you do need performance [traditional SQL-based systems] don’t deliver.”
The sluggishness of database systems usually can be attributed to a number of factors, Stonebraker said. Such systems maintain a buffer pool, maintain logs for recovery purposes, as well as manage latching and locking data fields so they aren’t overwritten by another operation. In one test held by VoltDB, these behaviors consumed 96 percent of the system’s resources.
Many see the emerging popularity of NoSQL databases, such as MongoDB and Cassandra, as an answer to the limitations of traditional database systems.
In another session held at the NoSQL Now conference, consultant Dan McCreary explained some of the shortcomings of regular relational databases that spurred developers to create NoSQL databases.
Relational databases aren’t very flexible, he said. The basic architecture was designed during the era of punch cards, and reflects a rigid approach to data modeling. If an organization needs to add another column of data, they must alter the schema, which can be tricky. The modeling process to create relational tables, called entity relationship modeling, also does not always accurately reflect how data exists in the real world.
“There are a lot of things that don’t fit into tables well,” he said. “It is too restrictive.”
Another problem with SQL databases is that they do not scale very well, beyond a single server, McCreary charged. If the data grows beyond the capabilities of a single server, it must be sharded, or split, across multiple servers, which is also a complicated process. Also, executing some operations across multiple servers, such as outer joins, in which data from multiple tables is fused, can be problematic.
While NoSQL databases offer greater scalability and flexibility, they have their own limitations, Stonebraker said. By not using SQL, NoSQL database systems lose the ability to do highly structured queries with mathematical certitude. Built from relational algebra and relational calculus, SQL offers a mathematical assurance that a well structured query would capture all the data that it set out to capture, even if the query itself is highly complex.
Other problems: NoSQL can not provide ACID (atomicity, consistency, isolation, durability)-level operations, a widely used set of metrics that assure that a database-driven online transaction is carried out accurately, even if the system is interrupted. Assuring ACID compliance can be written in at the application layer, though writing the code for such operations “is a fate worse than death,” he said. Lastly, each NoSQL database comes with its own query language, making it difficult to standardize application interfaces.
In contrast, NewSQL can provide the quality of assurance associated with SQL systems, while offering the scalability of NoSQL systems, Stonebraker argued.
The NewSQL approach involves a number of novel architecture designs, he noted. It eliminates the resource-hogging buffer pool by running the database entirely in main memory. It removes the need for latching by running only as a single thread of the server (though some overhead would still be needed for other locking operations). And expensive recovery operations can be eliminated in favor of using additional servers for replication and failover.
Stonebraker boasted that VoltDB’s own system, which uses these NewSQL-styled approaches, can execute transactions 45 times faster than a typical relational database system. VoltDB can scale across 39 servers, and handle up to 1.6 million transactions per second across 300 CPU cores, he said. It also requires far fewer servers than a typical Hadoop implementation, doing the same work in 20 nodes that would require Hadoop 1,000 nodes to execute.
While the audience was comprised of NoSQL users and developers, many seemed to think Stonebraker’s SQL-friendlier approach had some merit, even if they disagreed on individual points.
Dwight Merriman, a founder of online advertising company DoubleClick and one of the creators of MongoDB, agreed with Stonebraker that SQL itself doesn’t prevent scalability and slow performance. But he argued that SQL may not be the language everyone would want to use to parse and query their data in the years to come. “I would like to use something a little closer to the original language” that his applications are written in, he noted. SQL-based stored procedures are particularly difficult to work with for many programmers, he said.
Stonebraker is confronting the correct problem, McCreary said after the talk. Processors are not going to get any faster, but chip cores will continue to proliferate. So the issue of scaling out across multiple processors needs to be addressed, he said.
McCreary also agreed with Stonebraker’s view that NoSQL users don’t have a unified query language, which will slow the adoption of NoSQL as a whole. But McCreary suggested languages other than SQL could be used as a unified query tool for new databases, such as XQuery, a query language for XML documents.
Oracle did not immediately respond to a request for comment.