After collecting over 22 million electronic comments, the Federal Communications Commission must now read and interpret them. Over the last few months, data scientists have observed that many comments are from letters sent from fake email domains. These entries are likely bot generated and not submitted by individual people. Should the FCC count these robot comments? If so, how?

The short answer is that when bot commenters come to Washington, government agencies will need to rely on artificial intelligence to interpret artificial intelligence.

The Administrative Procedure Act mandates that the FCC consider relevant material in the public record. But with 22 million comments, that is more easily said than done. The Office of the Federal Registrar states that an agency is “not permitted to base its final rule on the number of comments in support of the rule over those in opposition to it,” explaining that the process is “not like a ballot initiative or an up-or-down vote in a legislature.” Therefore, the FCC needs a strategy to deal with the comments in a meaningful way, basing its final rule on “comments, scientific data, expert opinions, and facts accumulated” in the administrative process.

Several analysts have attempted to estimate the number of fake comments in formal and informal reports, thereby delimiting the scope of the problem. The Emprata group released a report that found that among 21.7 million comments, fewer than 2 million had unique messages, and more than 7 million came from fake email domains. The National Legal and Policy Center analyzed 5.8 million comments earlier in August, estimating that 95 percent or more of the mailing addresses appeared to be fake. Various other data scientists have noted the likelihood of bot activity last spring, studying time stamps on bulk submissions in the docket.

The FCC’s first challenge in creating order out of chaos is to separate comments into three large groups: Unique substantive comments submitted by humans; forms filled out and sent by humans; and comments generated and submitted by bots. The Commission can handle the substantive, human comments, which likely number in the low hundreds at most, as they have for every other rulemaking.

The second challenge, though, is to determine whether the bot comments collectively contain information that the FCC should consider. Some basic statistical techniques can help estimate the number of comments in each category. The FCC can, for example, count unique text strings and filter out repetitive form letters. But rigorously identifying the number of comments in each group requires deeper data science tools.

Humans cannot read and categorize 22 million comments, so the agency should deploy machine learning and artificial intelligence tools. Identifying which comments fall into which groups, however, does not begin to address the question of how to incorporate bot-driven comments. Again, various machine learning tools, including textual analysis, are likely to be the only reasonable way to extract information from noise in those comments.

But the FCC may not have the necessary expertise in-house. One possible, though probably impractical, approach would be for the FCC to host a data science competition to evaluate the comments, which are all publicly available. An intriguing (and potentially exasperating) implication of using algorithmic tools to evaluate comments is that the algorithms themselves are likely to become contested by those who do not like their outcomes.

Still, if this volume of comments to proposed rulemakings is likely to become the norm, agencies must learn how to deal with such comments rigorously. That means incorporating data science, machine learning, and artificial intelligence into the regulatory process itself. Such a change may open new questions and controversies, but it is the only practical way to consider “the relevant matter presented,” as the APA directs agencies to do.

Sarah Oh is a Research Fellow at the Technology Policy Institute.