The proximity networks are based on Bluetooth scans providing a measure of pairwise proximity between N = 464 highly-connected participants – freshmen students at a large university3. We define an interaction between users i, j in a 5-minute timebin t (the smartphone were configured to scan for nearby devices every 5 minutes) as γ ijt = s, where the signal strength s is reported by the handsets as received signal strength indicator (RSSI). Two users are considered to be interacting within a given timebin if their phones registered each other at least once in that timebin, regardless of the reported signal strength. This densely-connected dynamic network of all Bluetooth interactions is based on a total of 1472 094 interactions, taking place over 28 days. RSSI, measured in dBm, is defined as the observed signal power relative to 1 mW.

The long-range, sampled long-range, and short-range network

The long-range network is created by interactions occurring at any distance covered by Bluetooth range, between 0 and 10–15 meters. In order to capture only close range interactions, we establish the short-range network by selecting the subset of interactions with γ ijt ≥ −75 dBm corresponding to distances of approximately 1 meter or less4 (see Supplementary Information for additional details on the choice of threshod). The short-range network consists of f = 18.3% of all interactions.

Since the short-range network contains only a fraction of all interactions, the simulated spreading processes taking place on this network are trivially slower and smaller than processes occuring on the long-range network. The intuitive reason for this is that with an average of one fifth of the interactions, a node in the short-range network has correspondingly fewer opportunities of spreading a disease than in the long-range network. The difference in number of interactions therefore prevents us from directly comparing the interplay between structure and dynamics of spreading processes for the short-range and long-range networks using simulated disease models with the same parameters.

In order be able to compare directly, we create a sampled long-range network, which contains the same fraction of interactions as the short-range network, but chosen at random among all interactions (see Fig. 1a). As we argue below, the sampled long-range network thus contains both close and distant interactions and shares most topological properties with the full long-range network, while based on precisely the same number of interactions as the short-range network.

Figure 1 The network of close proximity interactions. (a) The full network contains interactions with signal strength used as a proxy for a proximity. In this illustration the dynamic network is integrated over time; edges represent single interactions between participants, and line-width indicates physical proximity. From this full network, corresponding to all edges that support full-range transmission, we create the short-range network by only considering interactions with γ ijt ≥ −75 dBm. The sampled long-range network contains the same number of interactions but chosen at random. (b) The link weights (i, j) are broadly distributed. The dashed line is a power-law p(x) ~ x−α with α = 1.19 inserted as a guide to the eye. The sampled long-range network (orange) has the same number of interactions γ ijt as the short-range network (blue), but maintains 62% of links, compared to only 31% of links remaining in the short-range network (inset). Full size image

Link weights in the three networks

We start our analysis by studying similarities and differences in the the distribution of link weights between the three networks (long-range, sampled long-range, and short-range). For each of the networks, we calculate the weights as described below, using the long-range network as an example. We first create an adjacency matrix A i×j×t with timebins t containing interactions aggregated over 5 minute intervals corresponding to the Bluetooth scanning rate. This matrix has entries a ijt = 1 when an interaction is present and a ijt = 0 otherwise. The weight w ij of a link connecting two individuals is defined as the total number of interactions occurring on that link \({w}_{ij}={\sum }_{t}\,{a}_{ijt}\). Note that because the sampled long-range network is generated by sampling interactions at random from the full network, it is possible to calculate the weight distribution for this network analytically.

We use a number of closely related (but distinct) terms to describe connections between pairs of individuals. A quick overview of terms are: Interaction: A single measurement of proximity between a pair of individuals. Signal strength: The RSSI measured by a smartphone for a single interaction. The signal strength can be considered a measure of distance. Link: An abstract description of the connection between two individuals, and implies at least one interaction. Links are sometimes denoted ties or connections in the literature. Weight: Number of interactions observed on a given link; sometimes called strength in the literature.

As shown in Fig. 1b, the distribution of link-weights in all three networks is broad with many weak links (containing few interactions) and a small number of links of very high weight. The short-range network and the sampled long-range network contain the same number of interactions, but the number of resulting links in the two networks is strikingly different. The approximately 1.4 million interactions in the full long-range network are distributed across 42 838 links, resulting in an average link-weight of a little over 34 interactions per link. We create the short-range and sampled long-range networks by removing 81.7% of the interactions from the full long-range network, leaving 269 094 interactions in both of these networks. The resulting number of links is much higher in the long-range network. Averaged over 100 realizations, this network has 26 511 ± 68 links, corresponding to around 61.9% of the links in the full long-range network. In contrast, the short-range network has only 13 474 links corresponding to only 31.5% of the links in the full long-range network. These differences are illustrated in the Fig. 1b inset.

Let us investigate these difference with respect to link weight in further detail. First, let us consider the weakest links. In terms of low weight links the sampled long-range network simply retains around f = 18.3% of the long-range network’s links, with small differences. The reason for these differences can be understood by considering links with weight 1. Of course, (100–18.3)% of links with weight 1 are removed, but the sampling process also creates new links of weight 1 by down-sampling the weight of some number of links with weight 2, 3, etc. In the short-range network a much higher fraction of links with weight 1 are removed, this network has about half as many links with weight 1 as we find in the sampled short-range network.

Now, considering high-weight links we find that these links in the short-range network are relatively unaffected by removing interactions according to physical distance: in the short-range network we find that the highest-weight links typically maintain ~80% of their interactions). This is in stark contrast to the sampled long-range network, where link-weight is depleted in proportion to the sampling fraction, and high-weight links maintain only ~18% of the interactions from the full long-range network.

In summary, the weight distribution in the short-range network suggests that friends (with high-weight links) tend to be physically close and that most low-weight links correspond to random encounters (encounters between strangers), consistent with results on interaction distance from both quantitative measurements4 as well as sociology6.

Differences in local structure

The key comparison is between the short-range network and the two long-range networks. Since our sampling is uniform over interactions, we expect the sampled long-range to be structurally very similar to the full long-range network, with weights decreased proportional to the down-sampling fraction. As we discuss above, however, many low-weight links disappear as part of the sampling process, and the overall network structure is complex, reflecting non-trivial and highly correlated underlying social behaviors. Therefore, it is useful to quantitatively confirm that the structure of the long-range and sampled long-range remain remarkably similar – and distinct from the short-range network.

Starting from the single node perspective, we find important differences between the short-range and the long-range networks. We can quantify this difference using the Shannon entropy. For a node i, we start from a link with neighbor j with weight w ij and define \(\pi ({w}_{ij})={w}_{ij}/{\sum }_{k}\,{w}_{ik}\) to mean the fraction of the node’s total interactions taking place on that link. Now, we define the node entropy as \(S(i)=-\,{\sum }_{j}\,\pi ({w}_{ij})\,{\mathrm{log}}_{2}\,\pi ({w}_{ij})\). Since infection probability is approximately proportional to link weight (see SI), this quantity can be interpreted as the expected number of yes/no questions needed to establish which of i’s links caused an infection. The distribution of entropy for all three networks is plotted in Fig. 2a. For the short-range network (blue), the distribution peaks at 4 bits, corresponding to an effective group of 24 = 16 potential sources of infection. Comparing the long-range (green) and sampled long-range (orange) networks, we find as expected that the distribution of node entropies are very similar, emphasizing the structural similarity between these two networks. The distribution for the sampled long-range network is created by averaging per-user entropy values over 100 random realizations of the sampled long-range network. Both peak at around 6 bits, corresponding to a larger effective group of 26 = 64 potential sources of infection in this network.

Figure 2 Difference in network structure. (a) Entropy of interactions. For every node i in the network we calculate node entropy S(i), see main text for the definition. Entropy values in the sampled long-range network are averaged per user over 100 random realizations of the sampled long-range network. Insets illustrate the link weights for a representative single node (entropy 3.6) in the short-range network (blue) and in the sampled long-range network (orange, entropy 5.1), values indicated by markers on the distributions. Note the similarity between distribution of entropies for the long-range and sampled long-range networks. (b) (upper panels) Network snapshots, showing the network structure for the sampled long-range (orange) and short-range (blue) networks at points indicated on the plot below. Note how the short-range network remains separated in small disconnected component longer than the long-range network. (lower panel) The horizontal axis shows changes to network properties as we add links one by one, starting with the strongest link. The green, orange, and blue line-plots show the number of connected components in the network. At around 120 links added, the long-range (green) as well as the sampled long-range network (orange) takes significantly longer to begin to become connected and show a decreasing number of components. Specifically, the short-range network (blue) remains separated into small neighborhoods until we have added approximately 250 of the strongest links. Thus the percolation process starts significantly later in this network. The orange line is an average over 100 random realizations of the sampled long-range network. Each realization is shown as a transparent gray line, illustrating that the network structure is consistent across samples. Also importantly, the fraction of interactions (shaded plots in blue/orange) within each network as we add the strong links also differ strongly between the networks. In the short-range network 250 links correspond to almost 50% of all interactions, whereas the same number of links in the sampled long-range networks contain only around 20% of all interactions. Full size image

These results provide a striking illustration of how the close proximity zone is preferentially reserved for strong ties (e.g. friends or acquaintances) while the distant zone is a more public space where many more random interactions happen, resulting in a correlation between physical proximity and tie strength as reported in ref. 9.

Meso-level structural differences

In the previous section we showed that in the short-range network a large fraction of interactions takes place on high-weight links. We now study the interplay between meso-level network structure and link-weight in the short-range and long-range networks. Specifically we are interested in the structures formed by the highest weight links. To explore these, we start building the networks from empty, adding their respective strongest links one-by-one. As links are added, we keep track of the number of connected components in the network as well as total weight of interactions added through the links, revealing the differences in the networks with respect to the structures created by the heaviest links.

Figure 2b illustrates how the process of adding links gradually grows the long-range and short-range networks, respectively. In the lower panel of Fig. 2b we show the number of the connected components and total number of interactions in the networks as the links are added. First, notice that the full and sampled long-range networks display identical behavior, with number of neighborhoods peaking with approximately 120 strongest links added. This behavior is consistent across 100 random realization of the sampled network. This is in contrast to the short-range network, where the number of components continues to grow up to 240 heaviest links in the network.

In both types of networks, the strongest links in the network first create small isolated neighborhoods of highly interacting nodes. Figure 2b (upper panel) shows snapshots of the sampled long-range (orange) and short-range (blue) networks at points (120, 250, 300 links) indicated on the plot below, illustrating this point. We see that at 250 strongest links the long-range network, a large connected component is beginning to form, making the network significantly more connected. At this point, the short-range network, however, is still divided into many small neighborhoods. We also note that while the x-axis indicates the absolute number of the heaviest links added to the networks, the total number of interactions included in the networks at any number of links is strikingly different. In fact, it is important to underscore just how large a fraction of interaction are concentrated on the high-weight links. The short-range network has a total of 13 474 links and the sampled long-range network has ~26 500 links. Figure 2b (bottom panel), however, shows that in the short-range network the 250 strongest links in the network account for approximately 50% of the interactions. In the long-range network the picture is less skewed. Here, the top 250 links account for approximately 25% of the interactions. Thus, while the percolation transition occurs for a very small number of high-weight links in both networks, these links include a large fraction of the total number of interactions.

Our analysis shows, therefore, that the short-range network not only contains fewer links than the sampled long-range network, but that the configuration of the heaviest links is more fragmented than in the long-range case. This structural property of the short-range network, the highly-connected neighborhoods bridged by weak ties, is consistent with well known structures found in other social networks, such as mobile phone networks and online social networks17,20,21,22. In the long-range network, however, this structure is less pronounced, obscured by the presence of spurious links, distinct communities bridged by a small number of strong links not present in the short-range network.

Spreading process is captured in neighborhoods

Having investigated differences between short- and long-range networks with respect to structure, we now explore how the differences based on how diseases spread on the networks. Using a simple Susceptible-Infected-Recovered (SIR) model, we run simulations of a disease spreading across the networks. Our model is intentionally simplistic, intended to illustrate the structural differences between short- and full-range transmission, rather than emulate a specific disease. We use the actual temporal sequence of proximity interactions observed in the data, choosing parameter values to create a situation where large outbreaks are likely, but not guaranteed (see Methods for details of the epidemic modeling). While we report results for a specific choice of parameters and a single realization of the sampled long-range network, these results are robust across a wide range values of the transmission parameters and realizations of the sampled network.

Based on the structural analysis, our hypothesis is that, in the short-range network, the simulated pathogen tends to be more contained within small sets of highly interacting individuals. We quantify the contained-in-communities behavior as follows. For each infection event, occurring on link w ij , where node i infects node j, we measure which fraction I j of the node’s direct (1-hop) neighborhood has already been infected. Since this is a weighted network, we define \({I}_{j}={W}_{\{-i\}}^{-1}\,{\sum }_{k\in {\mathcal I} (j),k

e i}\,{w}_{jk}\), where \( {\mathcal I} (j)\) is the set of j’s infected neighbors and \({W}_{\{-i\}}={\sum }_{k

e i}\,{w}_{jk}\) is the sum of all weights excluding the infecting link. A value of I j = 0 indicates that no-one in the direct neighborhood besides the infecting node has been yet infected; a value of I j = 0.5 indicates that neighbors accounting for 50% of link weights connecting to j have already been infected. Figure 3a shows a kernel density estimation of I as a function of the fraction of infected nodes, based on 500 runs of the spreading process in the short-range (left), sampled long-range (middle), and long-range (right) networks.

Figure 3 Dynamics of the spreading process. For each infection event, occurring on link w ij , where node i infects node j, we measure which fraction I j of the node’s direct (1-hop) neighborhood has already been infected. We define \({I}_{j}={W}_{\{-i\}}^{-1}\,{\sum }_{k\in {\mathcal I} (j),k

e i}\,{w}_{jk}\), where \( {\mathcal I} (j)\) is the set of j’s infected neighbors and \({W}_{\{-i\}}={\sum }_{k

e i}\,{w}_{jk}\) is the sum of all weights excluding the infecting link. (a) Plot of spreading process over 500 simulations. And increasing fraction of nodes are infected (F), we observe that nodes with different neighborhood infection levels (I) are infected. Kernel density outlines (using Gaussian kernel and silverman bandwidth) illustrating how a broader range of neighborhood infections can be observed in the short-range network (blue). (b) Cuts of distribution of I at three values of F (0.2, 0.4, 0.6, points indicated by vertical lines in the top plots), showing that distribution of neighborhood infections is broader in the short-range (blue) network. (c) Distribution of R2 of a linear model fitting infection of the neighborhoods I to the progress of infection (measured as fraction of network infected F), calculated for each of the aforementioned 500 realizations of an epidemic. The distribution of R2 peaks at around 0.4 in the short-range network versus 0.75 in the two long-range networks. Full size image

In the case of the short-range network, we observe behavior which suggest that the spreading agent is indeed slowed by neighborhoods, consistent with behavior of both simulated and real spreading processes found in the literature23,24,25,26,27. As is evident from Fig. 3a, early in the epidemic outbreak, when the fraction of infected nodes is low, the disease agent can saturate small neighborhoods and infect new nodes in neighborhoods, where a large fraction (I > 0.80) of neighbors are already infected. Conversely, it is still possible to find neighborhoods with a low fraction (I < 0.20) of infected nodes very late in the outbreak. These effects are possible because the spreading agent does not jump easily between neighborhoods of densely connected nodes.

The disease spreading is very different in the full and sampled long-range cases. In contrast to the contained-in-communities picture, the infection progresses smoothly through the network. In the long-range networks, the neighborhood infection is more closely proportional to the fraction F of the total network infected. Cuts at particular levels of overall network infection F in Fig. 3b show that the pattern of more spread-out I in the short-range network is consistent through the spreading progression and across random starting conditions (seed node and time) Visually, the distributions of I at given F are narrower for the long-range networks, with peak values of neighborhood infection I closer to values of overall network infection F. To quantify this effect, we consider the distribution of R2 of a linear model fitting infection of the neighborhoods I to the progress of the infection (fraction of network infected F), calculated for each of the aforementioned 500 realizations of an epidemic, the distribution of R2 peaks at around 0.4 in the short-range network vs 0.75 in the two long-range networks, as shown in Fig. 3c. This indicates that direct proportionality between the global (F) and local (I) infection level is a significantly better model for the long-range networks.

Thus we find, that while – in the short-range network – the infection tends be captured inside closely connected communities, the picture is quite different in the long-range network. While both types of behavior has been described in the literature8,23,24,25,26,27,28, the important finding in this context is that the two networks are representations of the same underlying behavioral data originating from a single population. These findings underscore how long-range spreading dramatically taps into spurious connections outside the social networks, resulting in fundamentally different types of spreading – in some ways mimicking the differences between droplet and airborne spreading mechanisms29,30,31,32.

Community structure increases infected-infected interactions

Our analysis of link weights showed that the short-range network tends to have fewer links with more interactions on each link. But why is the disease trapped within communities in the first place? One of the reasons that an infection remains ‘stuck’ in a neighborhood is that a disease can only spread via interactions between infected and susceptible nodes. Thus, if a local group is fully infected, we tend to see a large fraction of infected-infected interactions, which cannot help spread the disease. In Fig. 4a we quantify this tendency, by plotting how frequently infected-infected are active in the sampled long-range and short-range network, respectively.

Figure 4 Dynamics of the spreading process. Results for 10 000 SIR simulations. (a) As the infection progresses through the network, we keep track of how often the a link between two infected nodes is activated. Shaded areas indicate one standard deviation. (b) The overall result is significantly slower outbreaks in the short-range network than in the long-range networks. Full size image

We observe a clear difference between two networks. In the sampled long-range network, where the local connection patterns have high entropy, there is only a low level of activity among infected or recovered individuals. The spreading agent quickly reaches the entire network due to a large number of available susceptible-infected links. This behavior is in contrast to the short-range network, where infected-infected interactions present a larger fraction of interaction events. Thus, as above, given the same number of interactions and the same underlying behavioral data, outbreaks are significantly slower and more contained in the short-range network relative to the sampled long-range case (Fig. 4b).

Statistics of spreading outcomes

Finally, in Fig. 5 we summarize a number of statistics related to disease spreading in the three networks. These results confirm that the structural differences between the short-range and long-range interaction networks discussed above lead to reliably different outcomes in simulated epidemics. Firstly, in Fig. 5a, we show that when the outbreaks do happen in the short-range network, they are smaller in terms of total number of nodes infected. Moreover, the probability that an outbreak is contained – reaching only a small fraction of the network (<20%) – is higher in the short-range network than in the long-range networks (Fig. 5a inset). Finally, the time an infection needs to reach 50% of the short-range network is significantly longer, with the peak of the distribution for sampled long-range network occurring after 7 days, while the short-range network the peak is delayed to 10 days (Fig. 5).

Figure 5 Statistics of spreading. (a) In the short-range network the outbreaks are smaller than in the sampled long-range network, even though these two contain exactly the same number of interactions. The probability of outbreak being contained – reaching only a small fraction of the network – is also higher in the short-range network (inset). (b) When outbreaks happen, the time to 50% of the network becoming infected is significantly longer in the short-range network, because the spreading is captured within small neighborhoods. Full size image

Thus, consistent with the literature short-range short-range interactions are organized in a way that slows down spreading relative to the long-range case. The sampled long-range network features precisely the same number of interactions as the short-range network, but is structurally more similar to the full long-range network according to the measures considered here. Our results show that taking the physical distance of interactions into account results in networks that can significantly alter the outcome of a simulated outbreak. The qualitative behavior described above is reproduced across a wide range of parameter values.