An important aspect of our empirical model is the stochastic pattern of individual mobility that we implement. Initially, during a “training period” of 1 year, we let all agents choose destinations according to a traffic-weighted probability, as explained earlier ( Fig. 1c ). However, it is by now well established that individual mobility patterns are far from random [15] and that their statistics can be reproduced with two rules, exploration and preferential visit [16] , which we introduce after the training period, once individuals have built some travel history ( Fig. 1d ). During exploration, an agent visits a new airport with probability , where is the number of airports an agent has visited in the past. We use and ( ) from a Gaussian distribution with mean and standard deviation , values that fit human mobility patterns from real mobile phone data [16] . In the absence of comprehensive data for individual long-range travel history, we make the assumption that the parameters used to reproduce local human mobility can be applied for long range travel. The new airport is chosen according to traffic from node . During preferential visit, the agent selects a previously-visited airport with complementary probability . For an agent with home at airport , the probability of visiting an airport is proportional to the frequency of previous visits to that location, . Because the travel history built by individuals is mediated by traffic, the mobility model with exploration and preferential visit honors the initial traffic flux matrix.

When an agent is off ground, we assume he moves between airports with a constant velocity of 650 km/h. When not flying, an agent can be at one of three distinct places: at their home node, at a connecting airport, or at a destination. The waiting times of an individual at each of these locations is clearly very different. We obtain waiting time distributions for connecting airports and final destinations from the individual mobility dataset [26] , which indeed reflect a very different mean waiting time: in the order of a few hours at connecting airports, and a few days at destinations ( Fig. 1b ). Since the dataset lacks individual travel history, we cannot extract waiting times at the home airport, and we assume they are normally distributed [10] , [17] with mean and standard deviation , which recognizes that the average person in densely populated areas travels more often. This is based on the empirical relation between total traffic and population of an area [27] . For simplicity, we truncate the home waiting time distribution from below at day.

The agent then establishes an itinerary, or space-time trajectory, to reach the destination. We make the ansatz that the route chosen minimizes a cost function, which generally increases with the cumulative time-in-transit and the monetary cost of the ticket. Given that the trip elapsed time correlates well with the number of connections and the physical travelled distance, and that ticket price decreases with route traffic, we use the following empirical cost function associated with origin and destination : (1)where is the physical distance of the segment (accounting for the sphericity of the Earth), and the exponents and lie on the value ranges and . Which trip route is selected depends on the particular values of and . The ranges of values for these two parameters are chosen on the basis of producing itineraries that closely match those from real itinerary data [26] . To incorporate in our model the uniqueness of each passenger’s needs, we choose a unique combination of these two exponents for each individual. This reflects the current endemic heterogeneity in route selection from the wide range of connections, airline and price choices.

Individual agents traverse the network following empirical stochastic rules. Initially, before individuals build up a travel history, each individual positioned at their “home airport” chooses a destination airport with probability proportional to the traffic flux [13] , [19] , . Since the flux matrix accounts for trips in which the individual remains under the same flight number, we allow for an agent choosing some other destination with a small probability, .

We use the data to build an empirical model of human mobility through the air transportation network. To each airport , we assign a population by an empirical relation [27] , , which reflects a correlation between population and yearly total outgoing traffic at that airport, . Therefore, each individual agent in the model has a “home airport” [17] , [19] .

In addition to the aggregate traffic data, we use information of individual itineraries, provided by a major US airline for domestic trips [26] . This dataset extends over a period of four months in 2004 and includes 3.2 million tickets. We use it to extract the waiting time distribution at final destinations and at connecting airports ( Fig. 1b ).

(a) World map with the location of the 1833 airports in the US database from the Federal Aviation Administration ( www.faa.gov ). (b) Waiting time distributions at connecting and destination airports (from [26] ), and at the “home” airport. (c) Illustration of a 1-year travel history of an individual with “home” at San Francisco International Airport (SFO). (d) Graphical representation of the probabilities for exploration and preferential visit of the same individual, after the 1-year “training period.” During exploration the agent visits a new airport while during preferential visit the agent visits a previously-visited place with probability proportional to the frequency of previous visits to that location.

We develop a stochastic model of human mobility through a US-centric air transportation network. We use air-travel data provided by the Federal Aviation Administration ( www.faa.gov ) that includes all flights from all domestic and international airlines with at least one origin or destination inside the US (including Alaska and Hawaii), for the period between January 2007 and July 2010. Note that we do not have traffic information about flights whose origin and destination is outside the US. The air transportation network is a space-embedded network with 1833 airports, or nodes, and approximately 50,000 connections, or directed links ( Fig. 1a ). It is a highly heterogeneous network with respect to the degree (or connectivity) of each node, the population associated with each node, as well as the traffic volume through the links of the network [13] , [23] . The traffic data is organized in two datasets: “Market” and “Segment”. The Market dataset counts trips as origin-to-final-destination, independently of the number of intermediate connecting fights. The Segment dataset counts passengers between pairs of airports, without consideration of the origin and final destination of the whole trip. For example, a passenger that travels from Boston (BOS) to Anchorage (ANC), with connecting flight at Seattle (SEA), would be counted only once in the Market dataset as a passenger from BOS to ANC. In the Segment dataset, however, the passenger would be counted both in the segment BOS-SEA, and in the segment SEA-ANC. From these datasets we extract two weighted matrices that characterize the network traffic: a traffic flux matrix where is the yearly passenger traffic from origin to destination ; and a traffic transport matrix where is the yearly passenger traffic in the segment from airport to airport .

For a single ‘mobility’ realization, we run our empirical model of human mobility through the air transportation network with agents that are initially distributed in different “home” subpopulations. During an initial period of one year (training period), the agents are forced to choose destinations according to the traffic flux matrix. During this training period each individual develops a history of mobility patterns. Collectively, the mobility patterns honor the aggregate traffic structure from the dataset. During the second year, we incorporate the exploration and preferential-visit rules to assign destinations to individual agents. We use a time step of 0.5 hours, which we have confirmed is sufficient to resolve the temporal dynamics of the traffic-driven contagion process. For a given ‘mobility’ realization, we simulate the ‘reaction’ process as follows: we apply the SIR compartmental model at a randomly chosen time during the first half of the second year by infecting 10 individuals. In the study of late-time global attack, those 10 individuals are selected randomly across the entire network. For the study of early-time spreading, they are selected from the same subpopulation. For the Monte Carlo study, we average the results over 20 mobility and 200 reaction realizations.

Several important differences exist between the reference models described above and our empirical model of human mobility. For instance, the reference models all discard geographic information. They also all assume that agent displacements are instantaneous and synchronous, taking place at discrete time integers (e.g. one day), and neglect the large heterogeneity in waiting times. We will see that resolving these spatio-temporal processes, while not critical for late-time measures of disease spreading, is essential in the early-time contagion dynamics.

In Model 4, we extend Model 3 by considering a simplified model of recurrent mobility patterns [17] , [19] . Each individual is initially assigned to a “home” node. Individuals perform a random walk through the network of quenched transition rates and heterogeneous traffic, but return to their original subpopulation with a single recurrent rate [17] . We select days, corresponding to the mean waiting time at destination airports obtained from actual data [26] .

In Model 3, we extend Model 2 by enforcing that destination selection by individuals is done according to traffic: the probability of an individual at node selecting destination is proportional to .

In Model 2, we extend Model 1 by incorporating heterogeneity in the transition rates, as evidenced by the traffic data. To each node we assign a transition rate , but individuals still select a destination randomly, with probability .

In Model 1, we consider the US air transportation network but retain only information about the topology of the network. We model mobility as a simplified diffusion process, in which all individuals perform a synchronous random walk, moving from one node to another, all at the same rate [10] , [11] . We choose this rate to be the average rate at which individuals travel in our empirical model. Under these assumptions, all nodes with the same degree have the same behavior. We assign to each node a population corresponding to the stationary state, predicted by the mean-field theory [11] : for a node of degree , , where denotes the mean of the degree distribution , and is the average nodal population.

Our empirical model of human mobility through the air transportation network incorporates a number of dependencies that reflect the complex spatiotemporal structure of collective human dynamics. To understand which of these dependencies are essential, and which affect the modeling results to a lesser degree, we consider four different models of increasing complexity.

We used a value of the recovery rate days. We initialized the epidemic with 10 infected individuals chosen randomly across the network. We used a population of individuals, and average our results over 200 realizations. (Inset) The global attack for larger values of exhibits smaller differences among models, except for those between annealed and quenched transition rates at the nodes, as evidenced by the simulation results of Model 1 vs. the other models.

Monte Carlo study of the global attack of an epidemic as a function of the reproductive number, for the different models explained in the text.

We find that the global attack is quite sensitive to the degree of fidelity of the metapopulation mobility model, especially in the range of low reproductive numbers ( Fig. 2 ). Naturally, the global attack increases with for all models. There is a dramatic difference in the global attack between Models 1 and 2, highlighting the critical influence of quenched disorder in the transition rates out of individual subpopulations. The global attack increases also from Model 2 to Model 3, reflecting the super-diffusive anomalous nature of spreading when agent displacements are driven by traffic, as opposed to a diffusive random walk [12] , [13] . In comparison with these two effects––quenched disorder in transition rates and traffic-driven spreading––recurrent individual mobility patterns [17] , [19] have a relatively mild influence on the global attack, as evidenced by the differences between Models 3 and 4. We observe that the additional complexity included in our empirical model––geographic information, high-fidelity individual mobility, and time-resolved agent displacements––induces a slight delay in the epidemic threshold with respect to Models 3 and 4, indicating the nontrivial dependence of contagion dynamics on human mobility.

We apply the SIR contagion model to the four reference models described above and to our empirical mobility model. We employ the global attack, defined as the asymptotic (late-time) fraction of the population affected by the outbreak, as our measure of the incidence of the epidemic. We initialize the disease with a small number of infected individuals randomly chosen from the whole population. We obtain representative statistics by performing a Monte Carlo study and averaging over many realizations.

To study the dynamics of disease spreading through the air transportation network, we use the Susceptible–Infected–Recovered (SIR) contagion model. This model divides each subpopulation into a number of healthy (or susceptible, ), infected ( ) and recovered ( ) individuals, and it is characterized by a contagion reaction, , and a recovery reaction, , where and are the infection and recovery reaction rates, respectively, defined as the number of newly infected (resp. recovered) individuals per unit time for each initial infectious individual in a fully-susceptible subpopulation. Let be the number of individuals in each class in node at time , which satisfy at all times. Under the assumption of homogeneous mixing within a city, the probabilities for a susceptible individual to become infected is , and for an infected individual to recover is , which reflect the dependence on the time step . According to these rules, the expected increment in the infected and recovered populations at time are and , respectively, assuming that during the reaction step the subpopulation does not experience inflow or outflow of individuals. In our model, however, we track the state of each individual in the network. The reproductive number determines the ratio of newly infected to newly recovered individuals in a homogeneous, well-mixed and fully-susceptible population. From this observation follows the classic result on the epidemic threshold in a single population, . Much work has been devoted to the study of epidemic thresholds in metapopulation networks [10] , [11] , [17] , which generally shows that the reproductive number must be greater than 1 for global spreading of an outbreak.

Influential Spreaders

Finding measures of power and centrality of individuals has been a primary interest of network science [28], [29]. The very mechanism of preferential attachment shapes the growth and topology of real-world networks [1], indicating that the degree of a node is a natural measure of its influence on the network dynamics. Another traditional measure of a node’s influence is the betweenness centrality, defined as the number of shortest paths that cross through this node [28]. Betweenness centrality does not always correlate strongly with the degree, the air transportation network being precisely an example of poor correlation between the two [23]. It has been shown, however, that certain dynamic processes such as SIS or SIR epidemic spreading in complex networks appear to be controlled by a subset of nodes that do not necessarily have the highest degree or the largest betweenness [24].

Here we revisit what is meant by spreading, and make a crucial distinction between the asymptotic late-time behavior––which has been studied more extensively––and the early-time dynamics, for which much less is known. We show that the two behaviors are controlled by different mechanisms and, as a result, require different measures of spreading.

Influential spreaders at late times. We perform numerical simulations of epidemic spreading in our model by initializing the SIR compartmental model with infectious individuals at one single subpopulation. We compare the asymptotic, late-time spreading ability of different subpopulations by means of the global attack of the SIR epidemic (Fig. 3a). We study low values of the reproductive number , between 1 and 1.5, because the relative differences among different sources of infection are largest in this limit. Recent outbreaks of influenza A are estimated to lie within this range [30]. We rank the 40 major airports in the United States in terms of their asymptotic global attack, after aggregating the ranking over the range of reproductive numbers studied (Fig. 3b). The ability of a node to spread an epidemic depends on fast dispersal of agents to many other nodes, thereby increasing the probability of infectious individuals contacting a large population before they recover. Thus, intuitively, the asymptotic spreading ability of a node increases with its traffic and connectivity. In fact, we find that both degree and traffic provide fair rankings of influential late-time spreaders because in the air transportation network both quantities are strongly correlated (Fig. 3b, inset). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Late-time spreading ability of different airports, measured by the global attack of an SIR epidemic that originates at each airport. (a) Global attack as a function of reproductive number, for five different airports (see inset). We initialize the disease by infecting 10 randomly chosen individuals inside the subpopulation of consideration. We use days. Each point is the result of a Monte Carlo study averaging over 200 reaction and 20 mobility realizations and using individuals. (b) Ranking of the 40 major airports in US in terms of their spreading ability measured by the normalized global attack. We compare the normalized global-attack ranking curve (black diamonds) to the ones that result from considering the airport’s normalized degree (magenta squares) and the airport’s normalized traffic (brown triangles). Also shown is the ranking of the airports shown in (a). Both degree and traffic provide effective rankings of influential late-time spreaders, which in this case can be understood from the good cross-correlation between the two (inset). https://doi.org/10.1371/journal.pone.0040961.g003