A big thing nowadays is the proposition that “Big Data” will transform human resources, or HR – who to hire, who to fire, who to promote, etcetera. It’s an interesting and apparently seductive idea, that by capturing more and more data employers will be able to tell everything about their employees. I was discussing this last night and have been trying to think it through with my “social scientist” cap on. I think that if analytics does come to drive a lot of HR decisions it will do so much more slowly than we think. For a few reasons:

Garbage in, garbage out. Employee ratings systems are notoriously terrible, and are incredibly biased estimators of employee ability, or value added, or what have you. Employee quality is a notoriously slippery latent variable, and hard to even coherently define, much less measure. If we can’t trust the end measurement, we can’t trust anything we find about its determinants.

Employee ratings systems are notoriously terrible, and are incredibly biased estimators of employee ability, or value added, or what have you. Employee quality is a notoriously slippery latent variable, and hard to even coherently define, much less measure. If we can’t trust the end measurement, we can’t trust anything we find about its determinants. The complexity of the problem invites spurious correlations . There is an absolute ton of data available about people, ranging from the simple – demographics, college GPA – to the complex, like social media activity and text-processing of writing samples. This is great, but it means that as long as there are a sufficient number of employees you can always find some relationship that is “statistically significant” by hunting through enough variables. Doing this virtually guarantees you will be directing HR decisions based on spurious relationships. For example, if you take 100 variables known to have no relationship to the outcome and a large, almost certainly you will find 5 with statistical significance as predictors at the 95% confidence interval. That’s what the confidence interval means – you’re wrong 5% of the time.

. There is an absolute ton of data available about people, ranging from the simple – demographics, college GPA – to the complex, like social media activity and text-processing of writing samples. This is great, but it means that as long as there are a sufficient number of employees you can always find some relationship that is “statistically significant” by hunting through enough variables. Doing this virtually guarantees you will be directing HR decisions based on spurious relationships. For example, if you take 100 variables known to have no relationship to the outcome and a large, almost certainly you will find 5 with statistical significance as predictors at the 95% confidence interval. That’s what the confidence interval means – you’re wrong 5% of the time. Non-comparability of employees. This goes back to the first point – for some businesses, there are armies of employees doing similar tasks. For most businesses there aren’t, and it’s difficult to reduce performance to the same axis. A software developer and a salesperson simply can’t be measured on the same scale, it’s not meaningful. It’s often even difficult to compare two salespeople, especially if individual sales are large things like software packages or consulting projects – the more skilled employees tend to take on more challenging assignments. For companies comprising large amounts of white-collar works doing distinct tasks, the whole mission may be hopeless.

On the upside, some of these actually seem like problems that can be fixed. Probably the starting point is to examine what actually drives employee ratings – I have a feeling that the findings might be unsettling. At most companies, I would expect to see a major impact of things like race and gender, and a fairly loose relationship between ratings and actual performance as measured by whatever objective measures are at hand – e.g., client happiness, productivity, project cost/time overruns. This seems like one case where you’d actually want to have an outside consultant, because your own company’s managers might not be the most objective. The point in measuring the bias isn’t to chastise everyone – it’s so that HR can normalize ratings behind the scenes and hope to actually improve measurements of employee quality without having to trust individuals’ judgments as much.

It’s not a coincidence that most of these “Big Data HR” stories are coming out of firms that can objectively measure performance like call centers. The garbage-in-garbage-out problem is extremely difficult, and most of the firms that are currently using HR analytics without clear objective measures are probably putting out garbage. That needs to be fixed before anyone can hope to use serious analytics in the HR space, and it’s an incredibly difficult problem that likely has no obvious best solution. Only after the social science foundations are in place do firms have to worry about the Polanyian problem…