The ever-growing number of non-relational, or NoSQL, databases needs standardization in order to thrive, two Microsoft researchers argue in the new issue of the Association for Computing Machinery’s flagship publication, Communications.
“The nascent NoSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing NoSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another,” the two researchers, Erik Meijer and Gavin Bierman, write in a paper published in the April issue of Communications.
The pair of researchers offer a mathematical data model and standardized query language that could be used to unify NoSQL and SQL data models, work they call “coSQL.”
“There is little to disagree with in this paper,” said James Phillips, a co-founder and vice president of products for NoSQL database vendor Couchbase, who had no involvement in the work. “I firmly support the conclusion that a standardized data manipulation language would accelerate market adoption of NoSQL database technologies by eliminating developer-impacting fragmentation.”
Over the past few years, a variety of non-relational databases has emerged, including CouchDB, Cassandra and MongoDB. Administrators have found these new data stores more suitable than relational databases for tasks such as storing large amounts of data across multiple servers, or for easily storing information that does not need to be indexed for complex querying.
Meijer and Bierman compare this current flourish of non-relational databases to the proliferation of relational databases in the early 1970s. At that time, developers would have to understand the peculiarities of each database, as well as how to interact with the underlying hardware. What unified this industry was the widespread adoption of SQL (Structured English Query Language), the researchers argue.
SQL was an implementation of Edgar F. Codd’s relational model, which provided an algebraic basis for modeling databases. The mathematical model assured that all SQL databases would return the same results to the same queries, given the same data. And because most of the database vendors such as IBM adopted the model, programmers could just learn SQL, rather than a new language for each database.
Meijer and Bierman claim that NoSQL could benefit from the same standardization. “Just as Codd’s discovery of relational algebra as a formal basis for SQL … propelled a billion-dollar industry around SQL, we believe that our categorical data-model formalization and monadic query language will allow the same economic growth to occur for coSQL key-value stores,” they write.
The researchers also cast doubt on the widely held assumption that NoSQL databases are uniquely suited to tasks of storing large amounts of data, or Big Data as it is known. “It is possible to scale SQL databases by careful partitioning,” they write.
“Despite common wisdom, SQL and coSQL are not diabolically opposed, but instead deeply connected via beautiful mathematical theory,” they write.