The 2020 Democratic presidential candidates use Twitter like an earlier generation of politicians may have used a soapbox: to announce policy plans, solicit donations, marshal their supporters and criticize the current administration.

Each of these candidates is speaking to his or her own virtual village square. But how many people spend time in more than one village? How much overlap is there between, say, Elizabeth Warren’s audience and Bernie Sanders’s? And which candidates are most often associated with one another, based on their Twitter followers?

Twitter isn’t real life, of course; it’s an often-ridiculous short-burst social network that is decidedly not representative of the electorate at large. But it’s still a slice of life. The people following candidates on Twitter are those who want to receive a steady stream of information about at least part of the 2020 campaign. Understanding how that tribe operates can tell us something about an influential slice of the electorate.

So off our web-scraper went, dredging up every follower of the 20 Democratic presidential candidates who FiveThirtyEight considered “major” in early May, when we ran our script. The result was a data set with almost 20 million entries, which you can download on GitHub.

This data reveals the obvious, such as raw follower counts. It also reveals more subtle trends, such as which candidates’ followers are loyal, which cast a broad net, which seem to have a similar appeal and which apparently have nothing in common.

For starters, here are the candidates ranked by the share of their followers who don’t follow any other 2020 Democratic candidate.

Candidates whose followers are loyal only to them Share of each 2020 candidate’s followers who don’t follow any other candidates Account ▲ ▼

FOLLOWERS ▲ ▼

Exclusive FOLLOWERS ▲ ▼

@marwilliamson 2,610,335 – 74.8% – – @BernieSanders 9,254,423 – 63.2 – – @Hickenlooper 144,816 – 56.3 – – @CoryBooker 4,246,252 – 52.5 – – @JoeBiden 3,558,333 – 43.8 – – @AndrewYang 267,897 – 43.4 – – @TulsiGabbard 349,443 – 34.7 – – @BetoORourke 1,424,745 – 26.5 – – @amyklobuchar 692,985 – 24.0 – – @PeteButtigieg 1,033,834 – 23.7 – – @SenGillibrand 1,410,303 – 23.3 – – @KamalaHarris 2,640,072 – 22.3 – – @JulianCastro 212,582 – 21.4 – – @sethmoulton 138,450 – 20.1 – – @JayInslee 51,504 – 19.0 – – @ewarren 2,486,101 – 16.4 – – @TimRyan 20,080 – 15.6 – – @JohnDelaney 20,266 – 12.9 – – @MichaelBennet 21,053 – 11.7 – – @ericswalwell 84,415 – 9.2 – – Among candidates who were considered “major” by FiveThirtyEight as of May 8. Follower lists for each candidate’s primary accounts (according to a CSPAN Twitter list) were scraped from May 8-15, except for @Hickenlooper, which was scraped on June 6 to correct a coding error.

Almost three-quarters of the people who follow Marianne Williamson — a “spiritual and inspirational author, lecturer, non-profit activist,” per her Twitter bio — don’t follow any other Democratic candidate, putting them in a loyalty class all their own. Similarly, of the over 9 million people who follow Bernie Sanders, almost two-thirds follow no other candidate.

The 2.5 million people who follow Elizabeth Warren, on the other hand, are more gregarious — 84 percent of them follow at least one of her Democratic rivals. Ditto the 2.6 million people who follow Kamala Harris, 78 percent of whom also follow another candidate.

Digging a little deeper into the follower interaction information, we can find out, for example, which other candidates Warren’s followers are paying attention to. The Venn diagrams below try to answer that question, showing the overlap in followers between every candidate who had more than 500,000 followers in early May.

This chart reveals relatively large intersections between followers of Sanders and Warren, who share progressive policy platforms; between followers of Pete Buttigieg and Beto O’Rourke, who are both young, male and white; and between Harris and other major female candidates such as Warren, Kirsten Gillibrand and Amy Klobuchar.

Follower overlap patterns seem to share some similarities with Democrats’ vote preferences, too. Morning Consult has been tracking voters’ second-choice candidates, and according to the latest poll, respondents who planned to vote for Warren said their top backup choices were Harris and Sanders. Similarly, on Twitter, 60 percent of Warren’s followers also follow Sanders, and 37 percent each follow Harris and Biden — her largest overlap groups.

With some simple calculations, we can look past the sheer size of each Twitter overlap and get a sense for which pairs of candidates share some quality (ideology, Twitter skills, who knows) that makes them appeal to the same people. Theoretically, people could be following multiple presidential candidates at random, but that’s not how Twitter really works — if one account speaks to my interests, I’m likely to be interested in similar accounts.

To figure out which candidates are getting paired up more often than we’d expect based on chance alone, we rely on a number that data miners call “lift,” which is the ratio of how many followers a pair actually has to how many followers we’d expect them to have based solely on their individual Twitter popularity. For example, say we have 100 total Twitter users, and 50 of them follow Sanders while 10 of them follow Warren. If the reasons that a person followed Sanders had nothing to do with the reasons they followed Warren, we’d expect the overlap between the two to be five users. If it turns out that 10 users follow both Warren and Sanders, then we have a lift value of two (twice as many as expected), which means we can speculate that the two candidates share some quality that appeals to the same people. If only one user follows both, then we have a lift of 0.2 (one-fifth as many as expected), and we would suspect that there’s something about each candidate that drives away some people who follow the other candidate.

In the chart below, candidate pairs are organized by lift value, so those above the dotted line have more followers in common than you’d expect by chance while those below the line have fewer.

Some of the pairs that float above the line on this chart also stood out in the Venn diagrams, such as Harris and Warren, Harris and Gillibrand and O’Rourke and Buttigieg. O’Rourke and Julian Castro also have a relatively large overlap, perhaps because they’re both from Texas. The small dots at the very top capture overlaps that are tens of times larger than we’d expect to see if the candidates’ appeals to followers were unrelated. That’s probably because users who follow one lesser-known candidate such as Michael Bennet or John Delaney are likely to be highly engaged in the race and follow the other candidates as well. For example, the average Delaney follower also follows more than six other Democratic candidates.

The chart also reveals the candidate pairs who are not followed together. Williamson appears in most of these pairs, but the combination of Sanders and Booker also sinks to the bottom; their follower overlap is about half the size of what we’d expect given their individual popularity.

Twitter is just one front on which the fight for the Democratic nomination is being waged, but it does provide some insight into how candidates are using social media and who is listening. Democrats are, after all, looking for a candidate who can beat President Trump, who redefined how we view Silicon Valley’s little blue bird.

We want to hear how you’re using this data! If you find anything interesting, please let us know. Send your projects to @guswez or @ollie.

Dhrumil Mehta and Julia Wolfe contributed research.