The spread of Covid-19 can be predicted, in part, through patterns of Facebook friendships, according to a study from researchers at New York University.

The researchers looked at the progression of the disease from two early hotspots, in suburban New York, US, and Lodi province in Italy, and found that the spread was strongest in those areas that had significant links to the regions through the social network.

The results hold, the authors say, even when controlled for obvious connecting factors such as wealth, population density and geographical proximity. They could help explain why the virus spread the way it did, as well as suggest ways to prevent future community contagion from getting out of hand once lockdowns are lifted.

When it comes to Westchester County, the suburban New York cluster, for instance, “coastal regions and urban centres appear to have both high levels of connectedness to Westchester and larger numbers of Covid-19 cases per resident,” but there are specific hotspots in popular holiday locations for “many well-heeled residents” of the county, such as coastal Florida near Miami and ski resorts in southern and central Colorado.

For Lodi, meanwhile, a particular link shows up in both Facebook friend data and spread of Covid-19 cases to Rimini, a popular seaside resort on the Adriatic.

But not all connections hold, suggesting that Italy’s early lockdown helped matters. There are strong links in the Facebook data between Lodi and several provinces in southern Italy that “send workers and students to the industrial Lombardy reason”, the authors write, but “while some of these areas have seen a number of Covid-19 cases, they are not disproportionally larger, perhaps reflecting the efforts of Italian authorities to restrict the movement of individuals”.

The research was based around a new tool the authors built in conjunction with Facebook, called the Social Connectedness Index. The dataset allows researchers to measure the connectedness of two geographic areas, as represented by Facebook friendships, without needing to access the raw social graph itself, which could infringe user privacy or allow for a re-run of the Cambridge Analytica scandal. The Index is now available for other academics and non-profits to use for similar research.

“This finding suggests to us that the geographic structure of social network as measured by Facebook may indeed provide a useful proxy for the type of social interactions that epidemiologists have long known to contribute to the spread of communicable diseases,” the authors write.

The research highlights the power such massive datasets offer to epidemiologists. At the current stage of the outbreak, such insights are less useful, but in early and later stages, when the number of new cases is low or declining, being able to know which geographically distant locales might be at risk when a new cluster occurs could be more useful.

Facebook has made a number of such datasets available as part of its Data For Good programme, but has been wary of handing user data directly to governments and public health authorities, who might want access to more granular data. Mark Zuckerberg said in mid-March that, if asked, “at a high level, the answer is probably no, but it’s sort of a hypothetical, because no one is asking for this”.

Instead, governments have turned to other data sources. In the UK, mobile phone companies have been asked for aggregate data derived from phone towers, while the US has worked with the mobile advertising industry to take location data there.