Let’s answer some questions

Are all the filings identical?

Pretty much. There are 30 unique filings that contain the boilerplate within a larger text (most of them were, not surprisingly, complaining about spamming of the boilerplate). The rest are substantively the same. If you want to get technical, there are 128830 with the boilerplate and 120 with the boilerplate with added newlines.

Was the bot dumb enough to file thousands of comments in alphabetical order by first name?

Yes.

A substantial chunk were filed alphabetically by first name on May 9th within milliseconds of each other. They even sorted out uncapitalized names from capitalized names!

This consists of roughly 53,000 filings (40%).

Were there multiple “waves” of filings?

Yes. I can eyeball about 8, but I’m not sure how distinct they are. If you look at the CSV file on GitHub, you can see that the “alphabetical by first name” filings stretch from 2017–05–09 5:32PM up to about 04:00 AM the next day. A very non-scientific histogram over time gives you a sense.

Sorry about the lack of axes. My software is horrible.

Are these real people?

Most are, almost certainly. However, it’s also pretty certain that most didn’t send comments (ZDNet corroborates by contacting a few of the people). The question becomes: where did the data come from? The short answer is “I don’t know”, but here are a few thoughts.

Commercial or public marketing lists are possible: All the addresses seem to be real. A quick Google of some names and addresses seems to agree. For example, someone with a listed university address seems to go to that university.

Black market lists of identities: Checking a small sample of emails from the filings on https://haveibeenpwned.com/ shows about 90% are on a list of some sort. Maybe this is just a high base rate, though.

Preprocessed, bought list: Almost certainly. All of the addresses are nice and clean, all the email addresses are uppercased, and the filer names seem to be all formatted properly (with a few notable cases that could be data entry errors). In addition, all street addresses seem to be standardized according to USPS convention (Road -> Rd, etc).

How do I make my own FCC spamming bot?

It seems to be pretty easy, seeing as there’s an API call for that. A simple Python script or curl command could send off your own thousands of filings. [updated: 5/20] There are no publicly stated authentication or rate limiting provisions whatsoever besides the API key requirement. (Tweets from the FCC’s CIO indicate there may be internally-determined limits which are not disclosed to developers.)

Go forth and create your own bot! It’s that easy.

What should we make of this?

A net neutrality opponent, or someone working for them, got their hands on a list of people. They took advantage of the FCC’s no-friction comment filing API to blast hundreds of thousands of comments over the last few days. Almost all of them were automated and used false identities. They were sloppy about it, but it worked anyways.

I’d be interested in tracking down who did the bot spamming and where they sourced their identities from, but I have no idea where to start with that unfortunately.

Now what?

Go file your own comment or get informed about the debate. Play with the data if you want.

Bonus content!

Location of all the ZIP codes of supposed filers.