Microsoft is responding to the “Big Data” movement by adding support for the open source Hadoop framework for large-scale data processing to its SQL Server database and Parallel Data Warehouse platform.
The connectors will be available in CTP (community technology preview) form soon, according to a post this week on the official SQL Server Team blog.
Big Data refers to the ever-growing volumes of data being generated by enterprises, particularly from sensors and Web traffic.
“Our customers have been asking us to help store, manage, and analyze both structured and unstructured data — in particular, data stored in Hadoop environments,” Microsoft said in the blog post.
With the new connectors, customers will be able to interchange data between Hadoop environments, SQL Server and Parallel Data Warehouse, Microsoft said.
Hadoop, which is hosted at the Apache Software Foundation, was formed by Yahoo and is based partly on the MapReduce programming model developed by Google. An increasingly large commercial ecosystem has emerged around Hadoop, with companies such as Cloudera offering services and specialized distributions of the framework.
Microsoft’s move makes sense, given that its data warehousing vendors such as EMC Greenplum and Teradata have already embraced Hadoop, said Forrester Research analyst James Kobielus.
More and more enterprises are running Hadoop clusters and they want to be able to send data from those systems downstream to their data warehouse systems, he added.
But no one vendor can claim to have a fully built-out Hadoop offering, which would include distributions, connectors to Hadoop-related projects such as the Cassandra data store, modeling tools and other components, he said.
There is “no doubt” that like other vendors, Microsoft has serious plans for Hadoop, but it hasn’t made a long-term road map public, Kobielus added.
Microsoft is not embracing Hadoop at the expense of homegrown efforts, having recently released a MapReduce-based programming model, Project Daytona, for use on its Azure cloud platform.
Also this week, Microsoft announced that it has released a second Appliance Update for Parallel Data Warehouse. These updates combine new features for both hardware and software components.
The release includes new connectors for third-party BI (business intelligence) and data-integration tools from SAP, Informatica and Microstrategy.
In addition, a version of the PDW based on Dell hardware is now available, Microsoft said. Pricing starts at less than US$12,000 per terabyte.