The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive.
Like the Patriots, who rebelled against Britain’s heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.
“Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],” said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10 presenters at the NoSQL confab (PDF).
NoSQL-based alternatives “just give you what you need,” Travis said.
Open source rises up The movement’s chief champions are Web and Java developers, many of whom learned to get by at their cash-strapped startups without Oracle by building their own data storage solutions, emulating those being built by Google Inc. and Amazon.com Inc., and which they subsequently released as open source.
Now that their open source data stores manage hundreds of terabytes or even petabytes of data for thriving Web 2.0 and cloud computing vendors, switching back is neither technically, economically or even ideologically feasible.
“Web 2.0 companies can take chances and they need scalability,” said Johan Oskarsson, the London-based organizer of the NoSQL meeting and, like most of the other attendees, a Web developer (of music streaming site Last.fm). “When you have these two things in combination, it makes [NoSQL] very compelling.”
Many, said Oskarsson, had even dumped the open-source MySQL database, a long-time Web 2.0 favorite, for a NoSQL alternative, because the advantages were too compelling to ignore.
Facebook, for instance, created its Cassandra data store to power a new search feature on its Web site rather than use its exisiting database, MySQL. According to a presentation by Facebook engineer Avinash Lakshman, Cassandra can write to a data store taking up 50GB on disk in just 0.12 milliseconds, more than 2,500 times faster than MySQL.
What is NoSQL (technically speaking)? The names of these projects are as diverse as they are whimsical: Hadoop, Voldemort, Dynomite, and others.
But they are generally unified by a few things, including:
Don’t call them databases. Amazon.com’s CTO, Werner Vogels, refers to the company’s influential Dynamo system as a “highly available key-value store.” Google calls its BigTable, the other role model for many NoSQL adherents, a “distributed storage system for managing structured data.”
They can blow through enormous amounts of data. Hypertable, an open-source column-based database modeled upon BigTable, is used by local search engine Zvents Inc. to write 1 billion cells of data per day, according to a presentation by Doug Judd (PDF document), a Zvents engineer.
Meanwhile BigTable, in conjunction with its sister technology, MapReduce, processes as much as 20 petabytes of data per day.
“Definitely, the volume of data is getting so huge that people are looking at other technologies,” said SpringSource’s Travis, whose ‘VPork’ technology helps NoSQL users benchmark the performance of their database alternative.
They run on clusters of cheap PC servers. PC clusters can be easily and cheaply expanded without the complexity and cost of “sharding,” which involves cutting up databases into multiple tables to run on large clusters or grids.
Google has said that one of BigTable’s bigger clusters manages as much as 6 petabytes of data across thousands of servers. “Oracle would tell you that with the right degree of hardware and the right configuration of Oracle RAC (Real Application Clusters) and other associated magic software, you can achieve the same scalability. But at what cost?” asks Javier Soltero, CTO of SpringSource.
They beat performance bottlenecks. By sidestepping the time-consuming toil of translating Web or Java apps and data into a SQL-friendly format, NoSQL architectures perform much faster, say proponents.
“SQL is an awkward fit for procedural code, and almost all code is procedural,” said Curt Monash, an independent database analyst and blogger. For data upon which users expect to do heavy, repeated manipulations, the cost of mapping data into SQL is “well worth paying … But when your database structure is very, very simple, SQL may not seem that beneficial.”
Raffaele Sena, a senior computer scientist at Adobe Systems Inc., said that when Adobe relaunched its ConnectNow Web collaboration service a year and a half ago, it decided against using a relational database for just the reason raised by Monash.
Adobe uses Java clustering software from Terracotta Inc. to manage data in Java formats, which Sena says is key to boosting ConnectNow’s performance two to three times over the prior version.
“The system would have been more complex and harder to develop using a relational database,” he said.
Another project, MongoDB, calls itself a “document-oriented” database because of its native storage of object-style data.
No overkill. While conceding that relational databases offer an unparalleled feature set and a rock-solid reputation for data integrity, NoSQL proponents say this can be too much for their needs.
Take Adobe’s ConnectNow, which, even without a database, makes three copies of users’ session data while they are online — data that is mostly deleted after logoff, said Sena.
“We didn’t need a database since the best representation of the data was already in memory,” he said.
Support by bootstrap. Because they are open source, NoSQL alternatives lack vendors offering formal support. That’s no deal breaker to most proponents, who are plugged closely into this Silicon Valley-centric community and are thus comfortable with the bootstrap approach.
But some admitted that working without a formal “throat to choke” when things go wrong was scary, at least for their managers.
“We did have to do some selling,” admitted Adobe’s Sena. “But basically after they saw our first prototype was working, we were able to convince the higher-ups that this was the right way to go.”
Despite their huge promise, most enterprises needn’t worry that they are missing out just yet, said Monash.
“Most large enterprises have an established way of doing OLTP [online transaction processing], probably via relational database management systems. Why change?” he said. MapReduce and similar BI-oriented projects “may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS [database management system.]”
Even NoSQL’s organizer, Oskarsson, admits that his company, Last.fm, has yet to move to a NoSQL alternative for production, instead relying on open-source databases. He agrees that a revolution, for now, remains on hold.
“It’s true that [NoSQL] aren’t relevant right now to mainstream enterprises,” Oskarsson said, “but that might change one to two years down the line.”