This week on The Vergecast, Verge editor-in-chief Nilay Patel talks with Jeremy Singer-Vine, the data editor for the BuzzFeed News investigative unit, about his story that was published recently regarding the fake comments on the Federal Communications Commission’s online net neutrality debate.

If you haven’t read the piece, you should. The investigation details where all of the fake comments in the FCC’s net neutrality process came from, including dead people leaving comments and shady political operatives involved in the scam.

It’s not really a story about net neutrality. Instead, it’s about how systems designed for public participation in the government are so easily scammed and what the challenges are for preventing such scams from happening.

Nilay and Jeremy discuss why it happened, how it happened, and what happens next if we want to use the internet to encourage open access participation in government without corruption.

Below is a lightly edited excerpt of the conversation.

Nilay Patel: So a couple of weeks ago, you wrote a story with Kevin Collier at BuzzFeed called “The Impersonators.” This story uncovered that millions of comments in the net neutrality proceeding were fake, and you found the firms that kind of did the fakery.

Jeremy Singer-Vine: Right. So when the FCC opened up commenting on its net neutrality proposal to repeal the Obama-era net neutrality provisions, there were ultimately, over the course of many months, 22 million comments. And there was a lot of great reporting, including by The Verge and other outlets, showing there’s something very funky about a lot of these comments. Some were clearly fake, as in they didn’t come from anyone. Some seemed to be clearly impersonating other people. People said that there were comments under their names that they definitely didn’t leave.

And then general knowledge that there were millions of problematic comments had been known for a while, and we tried to get to the bottom of as much of it as we could, and we ended up focusing on one particular group of nearly 2 million comments that, through our reporting, we discovered were clearly instances of impersonation and had been ultimately funded by the broadband industry.

So just to put some perspective on this, it’s been happening for a long time. The FCC just recently won its court appeal saying you could change the rules, but then Trump gets elected. He appoints Ajit Pai to be the chairman of the FCC. Pai is sort of well-known as a character to the audience of the show, and they race through this proceeding to switch the rules. And in the course of that very fast process, a bunch of things happen. Notably, the FCC comment servers crashed. Pai claimed that they’d been attacked, but he never really released any evidence of this attack, and then everyone noticed that all of these fake comments were bubbling up.

What got me was Pai said over and over again, “It is not the quantity of comments that we get; it’s the quality of them,” which, to me, felt like telecom companies like Verizon have lawyers who are writing these comments. Those are higher-quality. We’re going to pay attention to those, not the regular people. But it kind of seems like they were astroturfed by paid political consultancies. And then they got to ignore the whole thing, and they just moved on with it. That was kind of my read of it. And I read your story, and it’s like, “Oh, it’s actually way more sophisticated than that underneath. There are actually companies, and it’s their business to flood these sort of open comment systems.” Walk me through who these companies are a little bit.

Sure. Pai’s comments and the general framing of quality over quantity is interesting because that is sort of the rule of law, that federal agencies are supposed to accept all the comments they can on any new, proposed rule. And they’re supposed to not treat it as a vote, although lots of agencies do report percentages that have comments that had one opinion or another. But they’re really supposed to say, “Here are the perspectives we received. We’re going to take these into account into our rule-making, but we don’t have to treat them as a sort of binding vote.”

That’s, I think, where Pai’s perspective comes from. It is grounded in the rules that agencies are supposed to follow. But political operators know that even if the public version of how these things work is quality over quantity, people are paying attention to the quantity. So there are political consultancies that have cropped up over the years that help organizations, regardless of political persuasion, but help people amass comments for public comment periods.

So when we saw in the net neutrality pursuing their 22 million comments, a huge proportion — nearly half — were submitted through the FCC’s bulk uploading system, and these were comments that were gathered on behalf of organizations, some pro-net neutrality, some anti. The idea is the FCC system, it crashed, as you noted, is not the most user-friendly system, so you can go out and you can collect comments on behalf of an opinion or an organization, what have you, and then submit them all at once to the FCC. Through [the Freedom of Information Act], we were able to get records of who had done those bulk submissions. And very quickly when we were looking through that data, this particular group of 2 million comments jumped out to us because they had a huge overlap with a data breach known as the Modern Business Solutions Data Breach that happened a little bit earlier than that.

Those comments had been submitted by a political consultancy known as Media Bridge, and they do a range of things, including very vocally, they’ve written on their website, flooding agencies with comments on a topic that the client asks for, essentially.

You’ve got this quote in the story, “Spend $1 million with Media Bridge and most likely you’ll have a million people-plus advocating for your position.” Which seems like the cheapest political advocacy ever devised.

I mean, certainly, the idea of mass commenting is not something that Media Bridge invented. It’s something that people of all political persuasions have been doing for a long time, and there is a legitimate use for it, which is lots of people feel strongly about something, but they don’t feel like they are good writers, or they don’t have the time to sit down for an evening and type out detailed thoughts.

So it has been accepted generally as politically legitimate to just sign on behalf of someone else’s statement that a political organization or advocacy organization may say, “Do you agree with this statement that we’ve prewritten? Insert your information, and we’ll sign it on your behalf and send it to the FCC or some other agency.”

But that’s not what’s happening here.

Correct.

You wrote that this is one of the largest instances of just misappropriation of identity that’s ever occurred in politics.

Right, and these comments were made to seem as if they came from regular people doing this sort of typical mass comment thing. On their surface, it looks no different than mass comments made by any other organization. But as we dug deeper, it turned out that for more than 94 percent of the comments that were submitted through Media Bridge, the personal information on them matched exactly the personal information — we’re talking the name, the physical address, the email address — with the data that was in that breach database, the Modern Business Solutions Data Breach.

And as we dug deeper, we found a pretty clear explanation for the remaining 6 percent. So as we did our reporting and we talked to people whose names were on these comments, it became clear that they had not submitted them and that the most likely explanation is that the data was just taken straight from this breach, attached to comments that were generated in a sort of Mad Lib style so that they all looked a little bit different — different enough that they would seem unique and submitted to the FCC.

As you’re going through the reporting process, how did you discover which breach they had used? Unless you have just encyclopedic knowledge, which would be amazing.

There is a great service available to anyone online called Have I Been Pwned. It’s run by Troy Hunt, a security researcher, and what he does is gather these breached databases as they’re floating around the internet, figure out what email addresses have been breached in each of those individual incidents, and then provide a service to let people look up “Have I been breached? Has this email address been breached?” And it lets you get a sense of how secure your personal information might be, but it also enables researchers like us to figure out whether a large set of email addresses. For example, we took a random sample of 10,000 email addresses from these comments that overlap, in particular, with any given database breach. He has collected, I think, more than 200 breaches at this point. And we didn’t come in with any preconceived notion about which breach would be relevant or even which of the sets of comments submitted to FCC would be relevant. But as we did our analysis, this set of comments and this particular breach shot right up to the top. There’s nothing like it.

So Media Bridge harvests names and email addresses from a data breach.

So Media Bridge is ultimately the organization that submitted the comments. They worked with another company called LCX Digital Advertising Company that our reporting found had been caught up in a couple of other impersonation allegations. They have a troubling history. They are run by someone who has repeatedly lied about his personal history and resume. We don’t know exactly the relationship between LCX and Media Bridge and who did exactly want, but it seems that given what else we know about LCX and what else we know about Media Bridge that LCX provided in the list of names and addresses to Media Bridge, which then submitted those to the FCC.

And they did that by taking this sort of Mad Lib generator and creating emails that were just different enough to evade detection?

Good question. If it’s different enough to evade detection, that doesn’t seem to be part of the impetus. But most federal agencies use some sort of de-duplication when they’re trying to read through comments, especially when you’re dealing with millions and millions of comments. I don’t know exactly what can get through that and what can’t, but it did seem that the goal of these comments that were submitted, the goal of those sort of text randomization, was to make it harder to say, “Oh, these are all the same comment” and to treat them all as one, instead to require that someone read through them all.

You have the text of a few of them in there, and they’re almost impossible to read.

Yeah. So it’s not the world’s most sophisticated text generation. It really is sort of like a Mad Lib-style generator. If you go to the article online, you can play around with the generator. We believe we’ve successfully reverse-engineered the algorithm or the process for generating them to get a sense of how it works. But it’s basically swapping in and out synonymous phrases, and sometimes when those phrases line up right, it reads like a reasonable letter. Other times, there are clear grammatical issues or sort of non sequiturs that don’t totally seem to make sense.

The FCC doesn’t have any obligation to verify that these are real people. I mean, your opening vignette, one of the names is a woman who died and her granddaughter is very unhappy about it.

Correct.

So the FCC doesn’t have to make sure that these are real people at all?

No, and in fact, when people have gone to the FCC, people who say they’ve been impersonated, the FCC not only says “It wasn’t our obligation to prevent that, but we’re not going to take it down.” Their policy is, “This is part of the permanent public record. If you disagree with something that’s been submitted in your name, you’re welcome to submit a follow-up comment that corrects the record or what have you.” But the FCC not only does not verify, but it does not try to verify. There’s no step in the process that would flag, for example, a large submission that seemed to impersonate a lot of people. There’s nothing in their process that would detect that.