On a single day in late May, hundreds of thousands of public comments poured into the Federal Communications Commission regarding its plans to roll back net neutrality protections. A week and a half later, on June 3, hundreds of thousands more followed. The spikes weren't the voices of pro-net neutrality Americans, worried what will happen if the FCC allows internet service providers to block and throttle content whenever it so chooses. In fact, they weren’t really voices at all.

According to multiple researchers, more than one million of the record 22 million comments the FCC received were from bots that used natural language generation to artificially amplify the call to repeal net neutrality protections. That number may only represent a fraction of the actual bot submissions. The New York Attorney General's office is currently investigating their source.

But while reports so far have focused on bad actors flooding the FCC with phony content, some of those same techniques also allowed legitimate groups, like the Electronic Frontier Foundation, to tell their members to click a button and send an auto-generated—albeit earnest—comment to the FCC, creating a groundswell of activism among actual humans. The result: A net neutrality comment period that garnered more input from the public than all previous comment periods across all government agencies—combined.

“It makes it easier for people to speak out, but much more difficult for them to be heard,” says Zach Schloss, an account manager at FiscalNote, a government relationship management company that’s been analyzing the FCC’s comments.

Now, as the commission attempts to sift through this unprecedented abundance of comments, discerning the legitimate from the bots could prove an insurmountable task.

Bots on Both Sides

The net neutrality comment debacle illustrates a central challenge of managing open platforms in an age of automation. Bots are overtaking the very system that’s supposed to give consumers a say in the rules that govern them, but weeding them out may jeopardize legitimate comments.

It’s a conflict platforms like Facebook and Twitter also face, as they work to eradicate fake or spammy activity on their platforms. Except unlike those companies, the FCC and other government agencies are bound by law to give the public a chance to participate in the rulemaking process. They're also required to consider “the relevant matter presented” in those public comments. When bots dominate the system, they drown out those relevant comments. And as language generation tools grow more sophisticated, they become harder to weed out. For a government legally required to hear out its constituents, this confusion is a brewing crisis.

'It makes it easier for people to speak out, but much more difficult for them to be heard.' Zach Schloss, FiscalNote

“The current state of the art in natural language generation is fairly robust and genuine-sounding,” says Vlad Eidelman, FiscalNote’s vice president of research. The company analyzes the entire history of public comments to help business clients predict new changes to government regulation. “You could generate a lot of comments that would seem legitimate, feel legitimate, and come from legitimate email addresses, but would not be representative of the public voice.”

FiscalNote analyzed all 22 million net neutrality comments, and found a number of suspicious patterns emerge among them. For starters, there was the historic volume. There was also the fact that so many comments came in on just two days: May 23 and June 3.

Those abnormalities alone weren't enough to conclude that the comments were fake. To determine that, FiscalNote’s researchers used natural language processing techniques to cluster the comments into groups. They divided them by sentiment—whether they were for or against net neutrality. They separated out comments that were identical or nearly identical, judging them to be form letters, which advocacy groups often prompt their members to submit. They also analyzed comments that touched on the same themes without duplicating the text exactly, to find similarities in their structure and word usage.