Hadoop has become popular for mining large data sets. Plans call for Hadoop 0.23 to run across 6,000-machine clusters, each with 16 or more cores, and process 10,000 concurrent jobs. Users will get more work done, Murthy said in a presentation at the O’Reilly Strata conference in Santa Clara, Calif. Performance, he stressed, is something users “can never have enough of.”
Other capabilities eyed for the upgrade include HDFS (Hadoop Distributed File System) federation as well as high availability for HDFS. MapReduce, which is the programming model and software framework in Hadoop, will be improved as well. Called “Yarn,” the MapReduce upgrade “is the first to take Hadoop and make it a much more general data processing system,” Murthy said. Yarn is “a high-performance rewrite of MapReduce,” with twice the throughput on large clusters, said Eric Baldeschwieler, Hortonworks CTO. Also, wire protocol compatibility planned for the 0.23 release will enable server and client upgrades to be done independently.
Also at Strata on Thursday, MarkLogic and Hortonworks announced integration between Hortonworks Data Platform and MarkLogic’s operational database platform. The integration will allow users to combine MapReduce with MarkLogic’s real-time interactive analysis and indexing on a single, unified platform, MarkLogic said. The arrangement is intended to help users better accommodate big data workloads. MarkLogic will certify its Connector for Hadoop against Hortonworks Data Platform.