Databases are evolving faster than ever, becoming more fluid to keep pace with an online world that’s becoming virtualized at every level.
In many ways, the database as we know it is disappearing into a virtualization fabric of its own. In this emerging paradigm, data will not physically reside anywhere in particular. Instead, it will be transparently persisted, in a growing range of physical and logical formats, to an abstract, seamless grid of interconnected memory and disk resources; and delivered with subsecond delay to consuming applications.
Real-time is the most exciting new frontier in business intelligence, and virtualization will facilitate low-latency analytics more powerfully than traditional approaches. Database virtualization will enable real-time business intelligence through a policy-driven, latency-agile, distributed-caching memory grid that permeates an infrastructure at all levels.
As this new approach takes hold, it will provide a convergence architecture for diverse approaches to real-time business intelligence, such as trickle-feed extract transform load (ETL), changed-data capture (CDC), event-stream processing and data federation. Traditionally deployed as stovepipe infrastructures, these approaches will become alternative integration patterns in a virtualized information fabric for real-time business intelligence.
The convergence of real-time business intelligence approaches onto a unified, in-memory, distributed-caching infrastructure may take more than a decade to come to fruition because of the immaturity of the technology, lack of multivendor standards and spotty, fragmented implementation of its enabling technologies among today’s business intelligence and data warehouse vendors. However, all signs point to its inevitability.
Case in point: Microsoft, though not necessarily the most visionary vendor of real-time solutions, has recently ramped up its support for real-time business intelligence in its SQL Server product platform. Even more important, it has begun to discuss plans to make in-memory distributed caching, often known as “information fabric,” the centerpiece middleware approach of its evolving business-intelligence and data-warehouse strategy.
For starters, Microsoft recently released its long-awaited SQL Server 2008 to manufacturing. Among this release’s many enhancements is a new CDC module and proactive caching in its online analytical processing (OLAP) engine. CDC is a best practice for traditional real-time business intelligence, because, by enabling continuous loading of database updates from transaction redo logs, it minimizes the performance impact on source platforms’ transactional workloads. Proactive caching is an important capability in the front-end data mart because it speeds response on user queries against aggregate data.
Also, Microsoft recently went public with plans to develop a next-generation, in-memory distributed-caching middleware code-named “Project Velocity.” Though the vendor hasn’t indicated when or how this new technology will find its way into shipping products, it’s almost certain it will be integrated into future versions of SQL Server. Within Project Velocity, Microsoft is playing a bit of competitor catch-up, considering that Oracle already has a well-developed in-memory, distributed-caching technology called Coherence, which it acquired more than a year ago from Tangosol. Likewise, pure-plays, such as GigaSpaces, Gemstone Systems, and ScaleOut Software have similar data-virtualization offerings.
Furthermore, Microsoft recently announced plans to acquire data-warehouse-appliance pure-play DATAllegro and to move that grid-enabled solution over to a pure Microsoft data warehouse stack that includes SQL Server, its query optimization tools and data-integration middleware. Though Microsoft cannot discuss any road-map details until after the deal closes, it’s highly likely it will leverage DATAllegro’s sophisticated massively parallel processing, dynamic task-brokering and federated deployment features in future releases of its databases, including the on-demand version of SQL Server. In addition, it doesn’t take much imagination to see a big role for in-memory distributed caching,