A skeptic’s take on Trump’s purported big data juggernaut, Cambridge Analytica

For the past week or so, an article titled “The Data That Turned the World Upside Down” has been following me around like a bad headcold.

The article tells a compelling “whodunit” story of data scientists, engineers, and political communicators, all fighting for control of a new, weaponized form of online political propaganda. It leads readers to the conclusion that conservative data vendor Cambridge Analytica (CA) used Big Data and “psychometric targeting” (also called psychographic targeting) to propel Donald Trump to the White House.

I’ve written about Cambridge Analytica before (several times, in fact). I am on record as a loud CA skeptic. I have described them as the Theranos of political data: I think they have a tremendous marketing department, coupled with a team of research scientists who provide on virtually none of those marketing promises.

And though I tried my best to read the article with an open mind, I am still left with the same fundamental skepticism about this new brand of political data alchemy.*

To be clear, I am not questioning the underlying science of psychometric targeting. Psychometric targeting simply categorizes individuals according to the standard “big five” personality traits, then treats these categories as market segments for the delivery of targeted advertising. Targeted advertising based on psychometrics is conceptually quite simple and practically very complicated. And there is no evidence that Cambridge Analytica has solved the practical challenges of applying psychometrics to voter behavior.

Here is a list of what you would need in order to apply psychometrics to voter behavior:

A comprehensive file of psychographic data on American citizens. Alexander Nix, the CEO of Cambridge Analytica, told the authors that his company has “profiled the personality of every adult in the United States of America—220 million people.” But, in a statement after the original publication of the article, the company also claims that it does not use data from Facebook and hardly used psychographics at all. So it is unclear where these comprehensive files are supposed to have come from, or how robust they are.

A comprehensive national voter file, matched to this psychographic data. As Daniel Kreiss shows in his new book, Prototype Politics, the Republican voter file was still very much a work-in-progress during the 2016 election. Matching this data to CA’s purported psychographic file would be a hairy technical endeavor, involving heavy collaboration from other Republican vendors who instead have downplayed CA’s role and raised questions about its transparency.

A massive creative team to craft targeted messages for each of these audience segments. This is one of the (many) insights from Eitan Hersh’s 2015 book, Hacking the Electorate. The more segments a campaign creates within a voter universe, the more distinct messages that campaign has to develop, test, and refine. Even if Cambridge Analytica correctly assigned every American to one of its 32 psychographic categories AND linked those profiles to a national voter file, the data would only become useful if the Trump communications operation was crafting distinct messages for each of the categories. But we know for a fact that the Trump campaign had a bare-bones communications staff. If CA had been able to hand the Trump communications team a detailed psychographic assessment of every targeted voter, the practical response would have been a bit like Henry Ford’s old comment: “The customer can have any color he wants so long as it’s black.”

It’s also worth noting that, in post-campaign debrief sessions, psychographic targeting has completely vanished from Cambridge Analytica’s presentations. At a symposium last month hosted by Civic Hall and the Knight Foundation, Molly Schweickert (CA’s head of digital) instead described their data operation as “going into the field on a weekly basis to collect hard ID responses [and] scoring individuals on candidate preference, issues they cared about, and likelihood to turnout.” Scoring voters based on likelihood of candidate support and likelihood of turnout is nothing new. As Sasha Issenberg documented in The Victory Lab, this was the cutting edge innovation of the 2008 Obama campaign. Rather than bragging about a new leap forward in voter targeting, Schweickert is effectively boasting that the Republicans have caught up to the Democrats.

The simple explanation here is that Cambridge Analytica has been engaging in the time-honored Silicon Valley tradition of developing a minimum viable product (vaporware, essentially), marketing the hell out of it to drum up customers, and then delivering a much more mundane-but-workable product. The difference here is that CA’s marketing has gotten caught up in our collective search for the secret formula that put Donald Trump in the White House.

But here’s the tough reality: “Moneyball” doesn’t always win. Donald Trump’s campaign didn’t possess a secret data innovation. His unlikely victory was due to a messy confluence of factors. The world has indeed been turned upside down by this election, but data scientists were not the cackling villains hidden just offstage. Trump ran a deeply flawed campaign! Hillary Clinton also ran a flawed campaign! There was also an organized anti-Clinton disinformation campaign, semi-coordinated by Vladimir Putin and Wikileaks! And the FBI ambushed Clinton two weeks before the election, while denying that they were conducting an investigation into links between the Trump campaign and the Russian propaganda effort!

The stories of Cambridge Analytica’s omniscience are fiction. The 2016 election was stranger than fiction.

*The article was original published in Zurich-based Das Magazin. Like so much of the Trump communications empire, it surely sounds better in the original German.