Collecting The Data

When my Twitter account became verified, I noticed that @verified started to follow me. Looking at its followings it’s easy to guess that it follows every verified account on Twitter. Therefore I can say there are 206 000 verified accounts on Twitter at the moment. And they verify circa 1000 accounts every day. There may be some accounts who block @verified and therefore don’t appear in their followings but I assume that the amount is so small that I can ignore it. Using the account as a starting point it’s possible to collect the network of verified accounts on Twitter.

I used a modified version of the Python command line tool twecoll by JP de Vooght to first collect a list of every account followed by @verified. The tool then went through all of these 205k accounts and looked whom they follow. For one data set I limited it to accounts who follow less than 10 000 accounts and a second data set for accounts who follow less than 1 000 accounts. There are two reasons for this. The more accounts people follow, the less important becomes each connection. The second reason is the technical limitation of my computer (i5 4670k at 4,2GHz, 16GB RAM, Samsung EVO 840 250GB, GTX 760). While it does work with the larger data set, it isn’t fun to work on, because everything takes longer.

The data collection ran on a Raspberry Pi 2 for 7 days from 22. to 28.08.2016 with only some hours pause because of errors I had to fix manually. Because of the long run time there are some inconsistencies in the data when people followed or unfollowed someone in that timeframe. On this scale it doesn’t make a difference. There are some accounts in the data set which aren’t verified anymore. I took a closer look at 36 accounts. These were all the accounts who lost their verified status in one day out of every verified accounts. Half of them deleted their account/got suspended, the other half went private and lost their verified status because of that.

The big data set, <10 000 followings, has 205 718 twitter accounts and 45 302 877 connections between them. The smaller data set, <1 000 followings, has 205 718 accounts as well and 19 176 260 connections.

I use Gephi to visualize the data. I tweeted the process of getting the data into a useful state. OpenOrd (25, 25, 25, 10, 15; cut 0,8; 500 iterations) gave me the most useful layout. Colors are calculated by modularity algorithm. I change the sizes of the nodes from time to time. If not noted they are followers.

Some General Stats

I loaded the stats of the 205k verified accounts into Excel and ignored the connections. These numbers don’t ignore any accounts, no matter how many accounts they follow.

When I submitted my account for verification, I was told on by some contacts that I don’t have enough followers. Indeed verified accounts have on average 117 845 followers. But there is quite a longtail. The median is at 9 370 followers. There are more than 100k accounts with less than 10 000 followers. And the rest doesn’t have that much more. The average gets skewed by mega accounts like @katyperry with 92.2m followers. There are 188 verified accounts with more than 10 million followers and 4 330 verified accounts with more than 1 million followers. There is one verified account with only two followers.

But how many accounts do verified accounts follow? On average they follow 2031 accounts. But again we got some mega followings. One account follows 3.6m accounts. The median is at a quite manageable 475 followings. Personally I feel like everything above 5 000 followings isn’t followed manually. Following everyone is an often used tactic to generate followings. So many people did it that Twitter introduced a limit that you can only follow a certain percentage accounts more than follow you (Base limit 5 000, daily limit 1 000). This resulted in a new follow-and-unfollow tactic. Accounts follow as many people as possible and unfollow everyone who doesn’t follow them back in x days. I digress. There are 3 551 accounts which follow nobody and 33 328 accounts which follow less than 100 others. One account returned a negative followings of -28. I assume that’s a bug with the Twitter database.