In 1995, some 40 million people all over the world were connected to the Internet. By 2000 that had grown to around 400 million, and by 2016 it reached 3.5 billion. That means almost half the global population is connected to a single technology.

That’s an extraordinary statistic and one that raises an interesting possibility. With so many people connected in this way, it should become possible to use this technology as a kind of demographic sensor that measures human behavior on an almost unimaginable scale.

Today, Klaus Ackermann at the University of Chicago and a couple of pals say they have done just this by studying how devices connected to, and disconnected from, the Internet between 2006 and 2013. They have done this on a global scale at a time resolution of every 15 minutes to produce a truly mind-boggling number of observations—one trillion of them.

So what does this enormous data set reveal about humanity?

Ackermann and co built their data set by combining information from two sources. The first is a set of scans between 2006 and 2012 in which every IP address was periodically probed to see whether it connected to a device or not. The second is a commercial database of IP geolocations which reveals the location of each device. Together this information produces a vast database covering Internet use in 122 countries every 15 minutes between 2006 and 2012.

The researchers start out by studying how Internet connectivity grows and eventually becomes saturated in societies all over the world. It turns out that Internet growth follows the same pattern everywhere.

Growth starts slowly, ramps up at dizzying rates, and eventually levels off as almost everyone gains access. This creates an S-shaped curve, as the researchers expected. Saturation occurs when there is about one IP address for every three-person household in a country.

More surprising is that it takes about 16 years on average for Internet use to saturate in any given country. That’s significantly faster than other technologies that have revolutionized societies, such as steam power, which took about 100 years, and electrification, which took about 60 years.

Curiously, only four countries had reached full saturation by 2012. These were Germany, Denmark, Estonia, and South Korea. Others, such as Turkey, have growth rates so slow that saturation will take decades.

Ackermann and co also look at the link between IP connectivity and economic productivity. They say that GDP per capita is positively correlated with IP connectivity per capita. In other words, countries with greater Internet penetration grow faster economically.

And the correlation is not trivial, either. They estimate that a 10 percent increase in IP per capita corresponds to an 0.8 percent increase in GDP per capita.

But they also point out that growth depends on the industry involved. “Broadly speaking, we find that service sectors amenable to digital competition through outsourcing (publishing, news, film production, administrative support, education) have suffered with increasing local IP concentration,” say Ackermann and co. “Whilst location-constrained sectors have prospered from higher Internet concentrations (wholesale, retail, real estate, repairs, hairdressing, mining, transportation, accommodation).”

The new database also allowed the team to study global sleep patterns. They did this by assuming that the switch from a device being online to offline corresponds with a person going to sleep (and vice versa). “The association need not be exact, instead a systematically leading or lagging relationship carries the required information,” say Ackermann and co. They then crunch the data for people in more than 600 cities around the world (having calibrated it against data gathered by the American Time Use Survey).

The result is the first global estimate of overnight sleep duration in 645 cities over seven years, and it makes for interesting reading. “In general, major cities tend to have longer sleeping times compared to surrounding satellite cities,” say the team.

But they say there is evidence that sleep patterns are changing, perhaps due to technology use. “Whilst North America has remained largely static over the study window, Europe sleep duration has declined, and East Asian sleep duration has grown,” they say. By this reckoning, global sleep patterns are converging. Exactly why is a fascinating open question.

That’s interesting work with significant potential. Of course, it’s not the first time that researchers have crunched big data sets to reveal insights about human behavior. These big data sets generally fall into three categories. The first comes from mobile phones but can only be studied by agreement with phone companies who choose whether or not to reveal it.

Other big data sets come from online services such as Google search, Twitter, and Facebook. However, these data sets have significant limitations, not least of which is that they are not representative of the general population.

And then there are satellite data sets, showing nighttime luminosity on the Earth’s surface, for example. These are certainly global but limited in geographical and temporal resolution.

But Ackermann and co’s data set is yet another approach on a truly global scale. “We view node-to-node online/offline scan data of the kind used in the present work as complementary to these other passive data sources,” they say. “It provides a first glimpse of the potential of global Internet activity to change profoundly the way research in this realm is conducted.”

We’ll look forward to seeing what other insights they can reveal.

Ref: arxiv.org/abs/1701.05632: The Internet as Quantitative Social Science Platform: Insights from a Trillion Observations