Over the past few weeks, I have developed—in cooperation with several other people—a collection of interesting data about the way people react to me on Twitter. I did not do this because I am self-involved but because for the past year and a half, the reaction to me on Twitter has been ringing alarm bells about disinformation, and the more recent reaction to others has heightened concerns. I don’t pretend to know what this data really suggests. I don’t know how much of the pattern I describe below is automated. I am not making any allegations against anyone. I’m putting the following out in public as a preliminary indication that something weird is going on that warrants examination—examination I am not qualified to do.

(This story, for reasons that will become clear, needs to be told in my voice—that of Benjamin Wittes—but much of the analysis below was conducted by Jacob Schulz; hence the joint byline combined, somewhat awkwardly, with an article written in the first person singular. It also draws significantly on work done by Christopher Bouzy, creator of the Bot Sentinel tool.)

My interest in the Twitter reaction to me began shortly after Brett Kavanaugh was confirmed to the Supreme Court in 2018.

I had supported Kavanaugh’s nomination early on, said nice things about him and defended him against allegations I thought were spurious. But though I publicly changed my position after Christine Blasey Ford came forward and testified, and I wrote a lengthy article opposing the nomination, a strange thing happened: Almost whenever I tweeted, and almost no matter on what subject, large numbers of people would remind me that I had supported Kavanaugh. Very large numbers of people.

And strangely, they seemed all too frequently to be using the exact same lingo. I was Kavanaugh’s “buddy,” I was reminded. I had “vouched for Kavanaugh,” people told me—over and over again. Around the same time, I noticed as well that I was Jim Comey’s “BFF.” Always BFF.

Then came the Bill Barr nomination. My support was far less energetic than my initial support for Kavanaugh; it was always tepid. I regarded his nomination as a calculated risk, and I wrote about the pros and cons of his nomination with relatively open eyes—though not open enough, as it turned out. While warning about the dangers he posed, I—like a lot of commentators—regarded him as an institutionalist, and I was complimentary about his intellect and past service. I certainly preferred him to the acting attorney general, Matthew Whitaker. And I argued that we should wait to condemn Barr’s handling of the Mueller report until we saw what he did with it. I ate a hot steaming plate of crow a few weeks later.

Yet again, my Twitter mentions were full of reminders—in remarkably consistent verbiage. This actually continues to this day. I “vouched for Bill Barr,” I was told recently in response to a tweet about a possible legal humor podcast. Another tweeter told me my expression of dismay at Ken Starr’s recent behavior was what he would expect from someone who “vouched for William Barr.”

I relatively quickly began wondering why the language was always so similar. Why was it always “buddy” with Kavanaugh? And why the specific phrase “vouched for” with respect to both Barr and Kavanaugh?

It isn’t exactly disinformation. After all, it was true—if wildly simplistic—that I had vouched for Kavanaugh and Barr. And Comey and I are, indeed, friends. So a barrage of reminders was not exactly an effort to spread false information about me; rather, a large number of accounts were simply responding to whatever I said with a simple and crude effort to caricature me based on endless repetition of the same simplistic themes. Where was this coming from?

A while back, I asked a friend who specializes in disinformation to look into the question. She examined the matter and reported back to me later that too much time had passed to make any determinations. Many of the tweets were no longer around, and many of the accounts were gone too. Indeed, looking back using Twitter’s search functions, I have been unable to find anything like the volume of material that used to be there. So I dropped it.

Then Lisa Page showed up on Twitter.

The reactions to Page’s tweets were astonishingly vile: misogynistic, hateful, demeaning. As with the earlier responses to me, a huge number of responses to Page were saying the exact same thing. The word “homewrecker” and “slut” appeared with odd regularity. The specific phrase “Ok, homewrecker” showed up with a frequency that could not be due to chance. What was different in the responses to Page was the volume, which was immense. And this time, the accounts in question were still live. So I started examining them without waiting many months.

In the intervening months, I had also become aware of Bot Sentinel, a platform that classifies Twitter accounts based on their tendency to engage in misleading and coordinated behavior—in other words, their penchant to act like a troll on Twitter. Bot Sentinel operates both as a browser extension that embeds in one’s Twitter feed to flag “untrustworthy” accounts and as a stand-alone website in which one can plug in an individual account to see how the service classifies it.

The service does not provide a binary bot-or-not determination, but instead offers a probabilistic “trollbot score.” Using a machine-learning algorithm to analyze an account’s political Twitter activity, Bot Sentinel produces a score that reflects the probability that an account is a “trollbot”—a user who participates in “coordinated harassment campaign[s],” retweets known misinformation or otherwise regularly engages in “repetitive bot-like activity.” Bot Sentinel describes the calculus as fairly straightforward: “The more you exhibit irregular tweet activity, the higher your trollbot score will be.”

The platform uses a tiered system to classify users based on their trollbot scores. Users are identified as normal (0-24 percent), moderate (25-49 percent), problematic (50-74) or alarming (75-100). Bot Sentinel reserves the “alarming” designation only for the most offending accounts. On average, less than 15 percent of the accounts the platform analyzes in a given 24-hour period meet the 75 percent threshold, according to Bot Sentinel. Though some particularly zealous human trolls can achieve a trollbot rating of 75, “those accounts that exceed 75 percent are largely inauthentic,” says Bouzy.

The service is, of course, not perfect and has faced criticism from some researchers for its imperfections. It does label some accounts run by normal human users as trollbots. These aren’t false positives per se; Bot Sentinel does not claim that trollbots are necessarily automated users. But this is a major drawback for researchers chiefly interested in reliably identifying and tracking automated accounts, rather than looking for swarms of harassing or misleading posts.

Disinformation specialist Renee DiResta expalined the disadvantages of services like Bot Sentinel in an email:

People are interested in understanding who or what they’re talking to on Twitter, particularly around political hot topics. There’s not much visibility into how accounts are behaving just by looking at their tweets, and anything that helps people feel more informed is filling that need. The problem is that sometimes certain “botlike” behaviors like tons of retweeting, incredibly high posting volume, hashtagging spamming - things that we do see from automated accounts - are also used by real people in “followback trains.” So while these tools are filling that need it’s also important to understand their limitations.

“Trollbot” designations understandably infuriate human users who get tagged with them, and the most vocal of these displeased users tend to be those who feel that Bot Sentinel contributes to the “dehumanizing [of] anybody who is conservative.” The propensity of Bot Sentinel to slap the trollbot tag on normal humans aboard the Trump train creates problems for researchers. In general, bot-identification services do have a bad track record of misidentifying conservatives tweeters as bots, a fact that makes disinformation researchers understandably cautious about using them as tools.

And Bot Sentinel doesn’t operate with a level of transparency that could entirely assuage these concerns. It operates, rather, as something of a black box; it offers an FAQ page but not a view inside its algorithm. Other services are more forthcoming about their inner workings. The Indiana University researchers who run Botometer, for example, publish academic papers that offer a candid look at their tool’s methodology.

That said, Bot Sentinel’s performance—at least in my view—inspires more confidence than its publicly available competitors. Botometer, unlike Bot Sentinel, offers an assessment of the likelihood an account is automated, but it sometimes struggles to identify clearly automated accounts. It assigns accounts a Complete Automation Probability (CAP) score that the IU researchers claim “is the probability, according to our models, that this account is completely automated, i.e., a bot.” What does it rate, for example, the Lawfare Podcast (@lawfarepodcast) account, a Twitter feed that automatically tweets out links to our podcasts as they are uploaded? It gets a 2 percent CAP score. The Big Ben Clock account, which tweets out the time on the hour by stringing together the syllable “BONG” (11:00 is “BONG BONG BONG BONG BONG BONG BONG BONG BONG BONG BONG”) receives a 0 percent CAP score.