Data warehousing and extraction, transformation and loading (ETL) tools may not be the sexiest of topics, but that apparently doesn’t stop enterprises from bragging about whose is bigger.
We’re talking about the terabyte size of an organization’s data warehouse, of course. At the ETL: The Humble Champion of Data Warehousing conference, held late last month in Toronto, execs talked about the changing face of data integration.
In the opening keynote, Stephen Brobst, chief technology officer and self-described ETL guru for Teradata (a subsidiary of Dayton, Ohio-based NCR Corp.), pointed out some of the current and future data warehouse challenges enterprises are up against.
“Appetite for data is outpacing Moore’s Law,” according to Brobst. More data, he claimed, will be created in the next two years than in the past 40,000 years. The need for data integration technology is growing, particularly as organizations seek methods to build a consolidated view of data scattered across disparate internal and external systems. That said, organizations must have a strategy and best practices in place for dealing with all that data.
“Data is not good enough — information is the goal,” Brobst said, adding that while the size of an enterprise’s data warehouse does matter, more important is having a holistic and single view of the customer and eliminating all planned and unplanned downtime.
Time-based transformations of business processes are becoming competitive necessities, Brobst argued. He explained the concept of “extreme data warehousing” where response times are measured (in real-time) in milliseconds and where enterprises have the ability to exploit all business relationships in the data.
According to Kevin Butcher, senior vice-president, technology and solutions for Toronto-based BMO Financial Group, regulatory and compliance issues brought on by the U.S. Sarbarnes-Oxley Act and similar legislation in Canada have created a heightened need for transparency and traceability in data stores.
“We’re now in a world where we’re doing near real-time ETL,” Butcher said. “For a bank, particularly, that’s a big change from the monthly ‘What are the numbers?’” It’s not an easy undertaking, particularly since vendors have different definitions of ETL, Butcher said. BMO tries to remain vendor-neutral with respect to its IT infrastructure, he added.
BMO is gradually making the shift from disparate data marts to a single source of information. “We do some standard cleansing and correcting of data…getting the information to a state where you can consolidate it. We’re in the middle ground at the moment where we have pretty good progress on the metadata side,” Butcher said, adding that the goal is to have a single data warehouse with reduced duplication. “We want to make sure that we don’t get the wrong information and the wrong granularity to the wrong people and purpose.”
Ken Picard, Dataspace project architect at RBC Capital Markets, said the Toronto-based firm recently completed an IT project using ETL technology. The Royal Bank of Canada recently signed an enterprise licence deal with Ascential Corp. for multiple ETL server components. Using Ascential’s ETL and service oriented architecture technology, the goal of the Dataspace initiative was to create a near real-time, operational data store with multiple distinct data source feeds, Picard said.
Traditionally, ETL tools run under the data warehouse and are used to pull out source data, transform and clean it in a required format, and load it into the warehouse. The current shift is towards the real-time data warehouse, Brobst said.
These real-time enterprise implementations include data warehouses that provide access to data in real-time and data warehouses that acquire data in real-time. Either way, organizations should avoid vendor hype, Brobst noted. Service-level requirements should be driven by the business need, not the technology.
The goal is to have an active data warehouse, where data is accurate up to the minute, is always-on, and where there is support for large data volumes, mixed workloads and concurrent users, Brobst said. Right now, technology is not the limiting factor, he said. Rather, it is all the legal and ethical questions surrounding data access.
Who will ultimately be responsible for the accuracy of data or the reliability of the analytics? Are the proper security measures in place? Who will ultimately be held accountable? These are the issues enterprises will face moving forward, Brobst said.
QuickLink 052586