Answering Your Questions About The Democratic Data Breach

Enlarge this image toggle caption Jim Cole/AP Jim Cole/AP

Data collection and analysis is a central aspect of a modern political campaign. It's just that usually, campaigns don't really talk about it.

But on Saturday night, the very first question in the Democratic presidential debate was all about the voter files that the Democratic National Committee maintains for its candidates to use.

That's because earlier in the week, several of Vermont Sen. Bernie Sanders' staffers had accessed, viewed and saved sensitive files belonging to former Secretary of State Hillary Clinton's campaigns.

Lawsuits were filed and threats were made, but the situation has now, for the most part, been resolved. The whole thing has led to a lot of questions — about what, exactly, the Sanders campaign did, and how important these voter files are to begin with.

Here's our attempt at answering every question you have about the DNC data breach.

OK, so what happened?

Bernie Sanders' staffers gained access to private Clinton campaign data. This information had been stored on a central server that every Democratic campaign uses. The documents identified likely supporters in key early primary states like New Hampshire and South Carolina.

The Sanders staffers were able to access the files because of a temporary glitch in the software that every Democratic campaign uses to not only access the party's central file of information about voters, but also store the proprietary data they gather over the course of the campaign.

The fallout was harsh and swift: One Sanders staffer was fired and two others were suspended. And more consequently for the campaign as a whole, the DNC reacted to the news by locking the Sanders campaign out of the entire database for nearly 48 hours. That left the Sanders campaign essentially unable to function, a situation it responded to by suing the DNC over breach of contract.

The Sanders campaign has access to its data again, and the DNC is conducting an investigation that both campaigns and the company that runs the program are cooperating with.

What, exactly, are these voter files?

On the day the data breach became public, Clinton's campaign manager, Robby Mook, said "anyone who accessed it really had access to the fundamental keys of our campaign."

Looking at the voter file as a whole, that's really not an exaggeration. These files are the central collection of everything campaigns have learned about the voters they're hoping will support them and then turn out to vote on Election Day.

On the Democratic side, every campaign begins with the same basic set of information that states collect about people when they register to vote.

"Typically you're going to find age, gender. You'll have address, sometimes [phone] number. In Southern states you'll even have the race of the voter," explained Ethan Roeder, who worked as data director for President Obama's 2008 and 2012 campaigns.

The Democratic National Committee stores all this information on a central database run by a company called NGP VAN (more on that later).

Using that database, campaigns take this information and begin building on it. They spend months of effort — and millions of dollars — calling voters, knocking on their doors and scooping up information about them in databases of consumer data.

Campaigns want to know which issues interest and motivate specific voters. Campaigns are also seeking the answers to two basic — and for campaigns, essential — questions, according to Roeder: "What candidate is this voter likely to support, and second, is this voter likely to vote?"

What do campaigns do with this data?

Political campaigns succeed or fail based on how they decide to use two critical, but limited, resources: time and money.

That is the nerve center of the campaign.

The more resources campaigns put into building out these voter data files, the better they can help top strategists identify which voters are definitely going to vote for them, which are open to persuasion and which aren't worth spending any time or money trying to reach.

"That is the nerve center of the campaign," explained Matt Oczkowski, who ran data operations for Wisconsin Gov. Scott Walker, who briefly ran for the Republican presidential nomination this year. "Every decision a campaign makes, in my opinion, should be somewhat data-driven and should be based off that file."

Traditionally, campaigns have made these decisions on a very local level. They might send mail to specific voters about the specific issues those people are interested in. Or they might make sure certain voters who said they'd definitely vote for their candidate get a ride to the polls.

But as technology has improved, this sort of analysis has become more sophisticated.

"By the time we were in 2012, we were using statistical analytical techniques to determine which cities in Ohio the president would visit," said Roeder.

Sounds pretty strategic and sensitive. Why would two rival campaigns use the same database?

Because for about a decade now, that's how the Democratic Party has wanted it.

It all goes back to 2005, when, in the wake of what the Democratic Party viewed as a devastating election cycle, former Vermont Gov. Howard Dean took over the Democratic National Committee.

At the time, Republicans had a far better system for collecting, analyzing and sharing the voter data. Democrats had no central system. Every state was on its own, and campaigns didn't really share information with each other.

In his book Taking Our Country Back, Daniel Kreiss, an assistant professor at the University of North Carolina, wrote that the party's system was a mess when Dean took charge.

"Without developed practices and attendant technologies for gathering and storing data, as well as transferring and sharing it across campaigns and election cycles, candidates lacked the capacity to know much about their potential supporters and the electorate more generally," Kreiss wrote. "Data often just disappeared between election cycles."

Under Dean, the party made a decision to change that. The DNC would take state-level voter files, clean up the various errors within them, and put them into one main database. Each Democratic campaign would then use the system, adding in its own information that would stay separate during the campaign itself, but eventually migrate into the big central file.

The party picked one company to build and operate this database: NGP VAN.

How is this central data system working out?

The overarching goal behind this was to get every Democrat on the same page.

"Data just flows much more seamlessly throughout the Democratic Party's ecosystem ... when you have this universal buy-in to single campaign tools," Kreiss said.

That centralized data is particularly useful once primary season is over, according to Roeder, data director for both of Obama's campaigns.

"Once we get to the general [election], it's in the best interest of the Democratic Party and of the eventual nominee to have all of that information in one place, so they can make use of it in their effort to ultimately prevail in the general," he said.

And there's general agreement within the political world that in 2008 and 2012, Democrats leapfrogged Republicans when it comes to data superiority.

But granting one company a monopoly on such a central function of campaigning has made a lot of people within the political world very upset. Particularly troubling, according to one data provider, is the dynamic of campaigns having to upload the data they've gathered into one central system.

"If you don't control the security of your own data, you don't control your destiny in a campaign," said John Aristotle Phillips, the CEO of Aristotle, a major political data provider.

"Things function very differently on the Republican side," he said. "It's a free-for-all. The candidates by and large will pick and choose who they want as a vendor ... there's very little control that the Republican National Committee or the state parties have over what candidates choose."

It's fair to say that a software bug allowing rival campaigns to see each other's data is a major problem for NGP VAN. The DNC has begun conducting an independent audit of where the company slipped up.

"We are adding to our safeguards around these issues," the company's CEO, Stu Trevelyan, said in a blog post.

How much of Clinton's data were Sanders' staffers able to take? And how useful was it?

Late in the day that the breach became public, the Clinton campaign distributed spreadsheets showing logs of what four different Sanders staffers looked at and, in some cases, saved. That included lists of voters in states like New Hampshire and South Carolina that the Clinton campaign had targeted as likely supporters.

Generally speaking, that's a proposition that would make most campaign operatives very nervous.

"That is extremely valuable information to know," said Oczkowski, who ran Scott Walker's data operations. "Because Campaign B can say, 'OK, I know exactly who these folks are targeting to turn out. How do I get in front of them to persuade them before Campaign A gets there?'"

That's why the DNC said it moved so quickly to freeze the Sanders campaign out of the system.

"This action was not taken to punish the Sanders campaign," DNC CEO Amy Dacey wrote in a Web post. "It was necessary to ensure that the Sanders campaign took appropriate steps to resolve the issue and wasn't unfairly using another campaign's data."

While onetime Obama staffer Ethan Roeder agreed that, generally speaking, voter file information can be critical, he didn't think what the Sanders campaigned accessed was "the crown jewels, by any means."

"What seems to have happened was a very specific type of information was viewed for a very limited amount of time in a limited way," said Roeder, who looked at the logs the Clinton campaign distributed. "It looks to me like the way they were able to view this information, and the types of big chunks that they cut out of it, isn't hugely valuable."

For its part, the Sanders campaign insisted, and continues to insist, that the motivation in accessing the data was simply to prove to the DNC that its system was acting up and exposing campaigns' data.

"We knew there was a security breach in the data, and we were just trying to understand it and what was happening," Josh Uretsky, the staffer at the center of the entire storm, told CNN last week.

Both Uretsky and others on the Sanders campaign, including Sanders himself, said they had witnessed prior data problems in other programs that multiple Democratic campaigns share, and were trying to flag a frequent problem.

DNC staffers say that in October, the Sanders campaign did point out potential flaws in a separate data program and that the problem was fixed by the company that ran it. (Details on this earlier breach in a separate system remain unclear). However, the DNC staffers dispute the Sanders' campaign's comparisons of the two situations, arguing that no data was breached in the fall.

But mindful of prior issues, Uretsky told MSNBC, "We were trying to create a clear record of a problem before reporting it." (Reached by NPR, Uretsky declined to comment.)

Several people with experience within this field told NPR they were skeptical of that argument, pointing out how several of the Sanders staffers targeted key primary states with their searches.

What happens next?

The DNC is investigating what, exactly, happened.

"This is necessary to confirm, as the Sanders campaign has assured us, that the data that was inappropriately accessed is no longer in possession of the Sanders campaign. The Sanders campaign has agreed to fully cooperate with the continuing DNC investigation of this breach," wrote Dacey.

Of note, though: the DNC has not said when this investigation will be complete, nor whether it will be made public.

As for the campaigns themselves, while Sanders apologized to Clinton on Saturday night (and she accepted the apology), it's clear there's still soreness on all sides.

Tellingly, the Sanders campaign has not withdrawn its lawsuit.