When explaining data quality, industry consultant Dave Wells likes to use the analogy of a battery that takes several hours to recharge and has a lifecycle span of 15 minutes between recharges. “That’s okay if the purpose of that battery is to power my electric shaver,” he said at this week’s Webinar Data Quality: Getting Started the Right Way. “But if the purpose of that battery is to power my laptop computer, it’s not okay.”
It’s the same with data across the enterprise in that its purpose will define its quality, said Wells, former education director with Renton, Wash.-based The Data Warehousing Institute (TDWI). That purpose could be to record business transactions, to measure business performance, to support decision-making, and even to discover and learn new facts not previously known, said Wells.
But defects that impact data quality can stem from various sources, according to Wells:
1. Obtaining bad data from the right source;
2. Faulty processes by which data is collected, like an operational system that’s based on poor calculations;
3. Miscommunicating the meaning of good data; and
4. Using good data for the wrong application.
But Wells pointed out that the reality of data quality is such that an organization will never have perfect data. It’s a continuous task, he said. “Approach it not as a project, but as a lifestyle.”
That said, organizations must consider the economics of data quality by weighing the cost of improving the quality of data versus the cost of living with defective data, said Wells. He described three options:
1. At one end of the scale is to repair defects only when they are discovered by an end user, be it customer or employee. While that approach is the least costly, it has a “relatively high cost of living with the defects,” said Wells.
2. In the middle of the scale is correction of the defect or cleansing the data from operational systems before end users who rely on the data for decision-making actually see it.
3. At the other end of the scale is prevention, or fixing the root cause so that defects are not repeated. “Prevention can have a relatively high cost in terms of quality management,” said Wells, “but it is the lowest cost in terms of the impact of defects.”
But given that organizations will never have perfect data, they must consider the point at which corrective measures are to be taken, said Wells.
Moreover, it helps if data quality is considered “an enterprise-wide issue, not a single system issue, therefore it can’t be a local fix. It needs to involve everyone, not just IT.”
While the entire organization must take responsibility for data quality, Wells said it should be viewed as a continuous improvement process whereby results are measured, monitored and feedback taken.
Integration of people, process and technology is also essential to creating that good data factory, said Wells, because “all the technology in the world by itself can’t fix data quality.”
Also present on the Webinar was Patrick Connolly, product marketing manager for Armonk, New York-based IBM Corp., who agreed that data quality must go beyond the IT department. It should be regarded as a coordinated effort that will impact the business “not just dish up quality data.”
“Data quality can be a tremendous onramp for an organization to achieve the goals of data governance,” said Connolly.
But while it’s important to recognize data quality, Connolly noted that it’s also vital to be able to deploy data quality across the organization, such as during application integration projects, or with services-oriented architecture where data quality is deployed from a centralized repository.