They fed their model Breitbart and Bluedot Daily articles to learn which combinations of words classify conservative or liberal bias. The model grew to a 150 MB natural language processing beast that they launched as NewsBot, a Facebook Messenger bot to which you can send any article for a diagnosis of its political leanings, a summary, and the option to ask for more sources. (“Basically,” Phadte says, “like a little devil’s advocate machine always giving you more information.”) Over the summer, they started analyzing who on Twitter was pushing out left or right-leaning articles to see if they could discern a Democrat from a Republican. Their model was confused about one group of Twitter users that didn’t act like either party in their tweeting patterns.

They were the bots.

Twitter has claimed that bots make up about less than five percent of its platform—but estimates from researchers go as high as 50 percent. As fall semester kicked up, RoBhat hand-picked 100 Twitter accounts with automated behavior to serve as “ground truth” data to train their model. They picked accounts with several red flags: ones that joined the site, say, a month prior but had tweeted 10,000 times, or ones that were followed by thousands of other suspected bots. (“I no longer feel bad about how few Twitter followers I have,” quips Bhat, follower count: 1,250.) They then added those accounts’ followers into the “ground truth” set as well. They needed a large number for their machine to analyze—6,000 in all. To teach their model what an actual breathing human on Twitter acts like, they pulled in 6,000 of Twitter’s “verified” users.

The model went to work, analyzing more than a hundred bits of data that Twitter makes readily available through its API, including profile bios, the date of joining Twitter, location, frequency of tweets, and the number of recent tweets versus older tweets—a way of pinpointing accounts that were once real people but have been taken over by a bot and gone rogue. The guys say that at this point their classifier can identify a bot 93.5 percent of the time.

Through its public blog, Twitter swears up and down that it’s doing everything it can to combat the bots, though it can’t tell you exactly what that is, given that would just tip off the bad actors. Yet certainly bots persist. After hearing Twitter reps speak at a closed-door hearing in September, Senator Mark Warner called the company’s response “frankly inadequate on almost every level.” Worse, Warner said, they didn’t even seem to grasp the gravity of the bot problem.

Twitter allows bots for reasons that, many would argue, are good. Twitter lets third parties access its platform and automate their tweets, allowing, say, news sites to tweet every story they publish and companies to automatically respond to customers’ queries. But bad actors can exploit that access. “If you’re a company posting commercial content on Twitter, those resources are extremely useful,” says Graham Brookie, of the Atlantic Council’s Digital Forensic Research Lab, one of the entities looking into the bot problem. “That said, if you’re a Russian troll farm in Saint Petersburg and posting disinformation on an industrial scale, those are also very useful.” That public access to the API also allows investigators like RoBhat Labs to get the vast amount of data on the users that allows them to try to identify bots.

“That’s why they get all these academics like me saying, ‘There’s bots on Twitter!’ because we can get the data easily,” says Fil Menczer of Indiana University, which developed another bot detector and studies the spread of misinformation on social media. “They are the most open platform, and they are criticized because of it.” The investigators point to options to cut down on the bots, such as labeling when a tweet is tweeted out from a third-party app instead of a human. One researcher at Cambridge suggested that Twitter require all bots to submit to an approval process like Wikipedia does. Menczer from Indiana advocates making suspicious accounts check an “I am not a bot” verification box with each tweet—and, in fact, a Twitter spokesperson says the company is starting to experiment with implementing Google reCAPTCHAs.