The New York Times Magazine this week is a special issue on debt (a topic that has a particular resonance to me: we are still paying off an expensive, but spectacular, year in New Zealand!). There is a fascinating article on what credit card companies can learn about you based on your spending. For instance,

A 2002 study of how customers of Canadian Tire were using the company’s credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.

A factor of 4 is a pretty significant difference. That should be enough to change the interest rate offered (and 2% default in 2002 is pretty high). The illustrations to the article go on to suggest that chrome accessories for your car are a sign of much more likely default, while premium bird seed suggests likely on-time payment.

The article was not primarily about these issues: it was about how companies learn about defaulters in order to connect to them so that they will be more likely to pay back (or will pay back more). But the illustrations did get me thinking again about the ethics of data mining. Is it “right” to penalize people for activities that don’t have a direct effect on their ability to payback but only a statistical correlation? Similar issues came up earlier when American Express started to penalize people who shopped at dollar stores.

I brought this up during my ethics talk in my data mining course, and my MBA students were split on this. On one hand, companies discriminate on statistical correlations a lot: teenage boys pay more for insurance than middle-aged women, for instance. But it seems unfair to penalize people for simply choosing to purchase one item over another. Isn’t that what capitalism is about? But statistics don’t lie. Or do they? Do statistics from the past hold equivalent relevancy in today’s unusual economy? Is relying on past statistics making today’s economy even worse? Should a company search for something with a more direct correlation or is this correlation enough?

At the Tepper School at Carnegie Mellon, we generally put more faith in so-called structural models, rather than statistical models. Can we get at the heart of what makes people default on credit card debt? For instance, spending more than you earn seems one thing that might directly effect the ability to pay back debt. It is hard to come up with a model where paying for drinks at the bar has a similar effect. But structural models tend to be pretty reduced-form. It is hard to include 85,000 different items (like in the study reported by the New York Times) in such models.

I vacillate a lot about this issue. Right now, I am feeling that data mining like the “spend in a bar implies default on credit cards” can lead to interesting insights and directions, but those insights would not be actionable without some more fundamental insight into behavior. The level of “fundamentalness” would depend on the application: if I am simply deciding on a marketing campaign, I might not require too much insight; if I am setting or reducing credit limits, I would require much more.

I guess this is particularly critical to me since I often play the bank at our Friday Beers. Since we might have 20 people at Beers, the tab can reach $300, and I sometimes grab the cash from the table and pay by credit card. Either the credit card companies have to come up with new rules (“If the tip is > 25% [as it often is with us: they do like us at the bar!], then credit is OK; else ding the record”) or I better hit the ATM on Friday afternoons.