Commentary & Analysis

By Bob Miller Whether &

By Bob Miller Whether ‘CIA' is the Culinary Institute of America or the Central Intelligence Agency may become clear when you know what they buy. June 7, 2004 -- Chances are, very few readers of this column are interested in obtaining a PhD in data quality. However, having focused on the practical in my last two columns, I think it's time to describe what's good about good data. There is no standard definition of good or bad data, but in my experience there are five general issues that cover just about anything you'll find in the real world--and in your own databases. Mistakes. This one is pretty easy to understand. Whoever typed it in typed it wrong. A name may be misspelled, or the digits in an address reversed. These problems can be very hard to find and fix, so successful companies go to great lengths to get it right up front. An interesting variation I have seen is the intentional mistake. I once worked on data from separate ordering and billing systems where install dates were consistently before order dates toward the ends of quarters. What was going on, of course, is that some reps had already made quota for the quarter and were ‘banking' the order. They had discovered that there was no validation being done on the dates. Never discount human ingenuity! Representation. Is the customer General Motors, GM, Pontiac, GMAC, Chevrolet or Chevy? Is the customer named Charles, Chuck, Charlie or Chip? You get the idea. The data is not actually wrong, but it can be quite hard to deal with in a computer. This is an area where understanding the business can really help with cleaning up the data. Whether ‘CIA' is the Culinary Institute of America or the Central Intelligence Agency may become clear when you know what they buy. Missing or insufficient data. 350 Fifth Avenue, New York, NY looks like a perfectly good address, but a suite number would really help in the Empire State Building. A less obvious but more insidious example might be the title of a contact. A Vice President at a bank may be in a very different position than a Vice President at a manufacturing company. The differences may be critical when one-to-one materials are being prepared based on this data. Aging. Reality changes over time. People change their names or move. One great source of spending data is shopper's club cards. However, addresses are rarely updated in this data. One common task at The Rochester Group is trying to keep this sort of data up to date by other means. any time data is processed in any way there is the chance that it will be altered for the worse. Mechanical conversion errors. Most readers of this article are probably familiar with the errors that are introduced when text is scanned. But any time data is processed in any way there is the chance that it will be altered for the worse. Data may be discarded as useless or codes combined. Customer records may be combined and assigned to an old address. When it really matters, we prefer to work with data direct from the source, even though it may contain errors. The way to good data is to begin with business processes that ensure the important information is captured accurately and kept up to date. These example give you a concise way of thinking of what might be wrong with data. It is important to consider how few errors can be fixed purely mechanically. Addresses can be altered to make them deliverable, but not necessarily up to date. The way to good data is to begin with business processes that ensure the important information is captured accurately and kept up to date. And when the inevitable cleanup is required, a process which includes intelligence about the business will produce much higher quality results.