An introduction to DW/ BI data quality risk assessments

Data warehouse and business intelligence (DW/ BI) projects are showered with risks – from data quality in the warehouse to analytic values in BI reports. If not addressed properly, data quality risks can bring entire projects to a halt, leaving planners scrambling for cover, sponsors looking for remedies, and budgets being wiped out.

Usually, data consumers, such as end users, don’t often know exactly what they want delivered until they start seeing early versions of a BI application (e.g., reports). This circumstance often requires DW/BI teams to build the data warehouse and application reports before they are fully defined and specified. Couple this challenge with the data quality problems inherent when sourcing operational systems and the potential for risks are very real. (Editor note: Learn what are the attributes you should track to develop a comprehensive report inventory).

This article outlines methods for recognizing and minimizing the data quality risks often associated with DW/ BI projects. Addressing additional DW/ BI project risks (including performance, schedules, defects discovered late in the software development life cycle, etc.) is also important, but is beyond the scope of this piece.

Data Quality is the desired state where an organization’s data assets reflect the following attributes:

Clearly defined

Correct values in sources – during extraction, while loading to targets, and in analytic reports

Understandable presentation format in analytic applications

Usefulness in supporting targeted business processes.

Figure 1 illustrates primary control points (i.e., testing points ✓) in an end-to-end data quality auditing and reporting process.

Data quality risks should be addressed early and often

The extraction, transformation, and loading (ETL) process is still the most underestimated, under-budgeted part of most DW/ BI iterations. And the biggest reason why the ETL portion of a project often raises more questions than it resolves has to do with a lack of understanding of the source data quality along with the resulting data quality of ETL processes.

Before diving into the most common data quality risks for data warehouse and business intelligence projects and their risk management, let’s ensure we are on the same page with the following, as we will address them both:

Risk assessment is the conversion of risk assessment data into risk decision-making information. Risks are composed of two factors: (1) risk probability and (2) risk impact. Data quality risk management is a structured approach for the identification, assessment, and prioritization of data quality risks followed by planning of resources to minimize, monitor, and control the probability and impact of undesirable events.

The most common DW/ BI project data quality risks are listed below, each accompanied by the probability / odds that they typically exist, their likely impacts, and recommendations for mitigating those risks. The listing is not intended to be exhaustive.