Seasonal influenza epidemics offer unique opportunities to study the invasion and re-invasion waves of a pathogen in a partially immune population. Detailed patterns of spread remain elusive, however, due to lack of granular disease data. Here we model high-volume city-level medical claims data and human mobility proxies to explore the drivers of influenza spread in the US during 2002–2010. Although the speed and pathways of spread varied across seasons, seven of eight epidemics likely originated in the Southern US. Each epidemic was associated with 1–5 early long-range transmission events, half of which sparked onward transmission. Gravity model estimates indicate a sharp decay in influenza transmission with the distance between infectious and susceptible cities, consistent with spread dominated by work commutes rather than air traffic. Two early-onset seasons associated with antigenic novelty had particularly localized modes of spread, suggesting that novel strains may spread in a more localized fashion than previously anticipated.

The underlying mechanisms dictating the spatial spread of seasonal influenza remain poorly understood, in part because of the lack of spatially resolved disease data to quantify patterns of spread. In this paper, we address this issue by analyzing fine-grain insurance claims data on influenza-like-illnesses over eight seasons in ~300 locations throughout the United States. Using statistical methods, we found that seven of eight epidemics likely originated in the Southern US, that influenza spatial transmission is dominated by local traffic between cities, and that seasons marked by novel influenza virus circulation had a particularly radial, localized spatial structure. These findings are in stark contrast to prevailing theories of influenza spatial transmission that suggest that transmission is favored in low humidity environments and that spread is a dominated by air traffic between populous hubs.

Funding: We acknowledge financial support from the in-house research program of the Division of International Epidemiology and Population Studies, The Fogarty International Center, US National Institutes of Health, funded in part by the Office of Pandemics and Emerging Threats at the United States Department of Health and Human Services (CV, VC, LS, BTG). LS, JG, and BTG acknowledge support from the Research And Policy in Infectious Diseases Dynamics (RAPIDD) program led by the Fogarty International Center and funded by the Department of Homeland Security. LS acknowledges support from a European Union Horizon 2020 “Marie Curie” senior fellowship. SK acknowledges support from a Gates doctoral fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: The IMS Health data were obtained through a research agreement between the authors and IMS Health, as stated in main text. All patient records and information were anonymized and de-identified prior to being handed over to researchers; all records were part of routinely collected information for health insurance purposes. Dr Farid Khan, previously Director of Advanced Analytics, IMS Health, granted access to the patient data. Requests for IMS Health data should be made directly to IMS Health; researchers interested in gaining access to the data should refer to the IMS Health website: http://www.imshealth.com/portal/site/imshealth . Data are derived from ICD-9-coded CMS-1500 medical claim forms from full-time office-based active physicians throughout the US; this data can be requested from IMS. All other datasets used in the paper are publicly available. County-to-county work commutes and air travel passengers between pairs of airports are available from the US 2000 census and www.transtats.bts.gov , respectively.

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Here, we present a detailed analysis of influenza spread between US cities by modeling a unique set of geo-referenced time series of influenza-like illnesses generated from a large-volume compilation of outpatient medical insurance claims across the US, spanning 8 influenza seasons. Our analysis provides the first detailed characterization of seasonal influenza spread across the continental US and its relation to subtype-differences in susceptibility/contagiousness, human mobility and environmental factors.

Though rare, influenza pandemics offer valuable opportunities to study the dissemination of an invasion wave in a susceptible population, with higher than usual attack rates and intensified data collection efforts. Previous work has indicated that gravity models [ 17 ] can provide parsimonious descriptions of the spatial diffusion process of pandemic waves, particularly for the 1918 pandemic in England and Wales [ 18 ], and the 2009 A/H1N1pdm pandemic in the US [ 19 ]. These models describe the pairwise spatial interaction between communities as a function of population size and distance, each tuned by power-law parameters. A previous analysis of the 2009 pandemic across the continental US revealed a surprisingly local and radially diffusive wave of spread originating in the Southeastern US [ 19 ]. The observed pandemic trajectory ran contrary to the prediction of fast hierarchal invasion among populous and interconnected locales followed by slower “in-filling” of the interspersed smaller communities. We currently have no guiding principles for when or whether we should expect such a localized mode of spread for seasonal epidemics since there are substantial differences in both the age-groups infected and in population-level immunity during epidemic v. pandemic seasons [ 20 ]. Further, prior work has indicated that global and regional patterns of influenza spread and viral persistence differ by subtype, perhaps mediated by differenced in immunity [ 1 , 4 ]. Within a subtype, prior exposure history and antigenic changes may render specific age-strata differentially immune to the particular viral strain in circulation, in turn affecting patterns of spread. In the absence of a full understanding of the interplay between prior immunity and disease dissemination, mobility studies and simulation models must be validated with real-world incidence data.

A number of studies have explored the roles of human mobility, demography, and environmental factors in the spread of seasonal influenza on global and regional scales. These studies have suggested the following “principles”: (i) at a global scale, the worldwide air transportation network serves as the predominant channel for the dissemination of pandemic and seasonal influenza viruses, A/H3N2 viruses in particular [ 3 , 6 – 11 ]; (ii) at the regional scale of the US, short-distance work commutes are a major driver of the spread of seasonal outbreaks, though longer-range air traffic has been implicated as well [ 1 , 12 – 14 ]; (iii) influenza is marked by rapid, hierarchical spread between populous centers followed by subsequent spread to less populated areas [ 1 ]; (iv) low-humidity environments favor viral stability and thus transmission of influenza [ 15 , 16 ]. These sometimes-conflicting findings highlight the complexity of influenza spatial transmission across geographic scales [ 5 ] and indicate that more detailed analyses are necessary to deepen our understanding of the drivers of spread.

Understanding the spatial spread of infectious diseases is essential for clarifying mechanisms of transmission and targeting control interventions. Seasonal influenza offers a unique opportunity to study the spatial diffusion of a directly transmitted pathogen in partially immune populations due the yearly invasion, extinction and subsequent re-invasion of viral strains in the northern and southern hemispheres [ 1 – 4 ]. Detailed characterization of the underlying mechanisms of spread has, however, been hindered by lack of spatially resolved incidence data spanning multiple seasons [ 5 ].

Finally, we also considered more complex models incorporating other demographic variables (population size of the infectious locations) and climatic information (absolute humidity in susceptible locations), but none of these factors provided substantial improvements to the model fit (supporting S1 Text , Table A3). Sensitivity analyses revealed that parameter estimates were robust to the inclusion of measurements errors in onset times (supporting S1 Text , Figure A13), and were comparable to those estimated under a fully-specified likelihood (instead of the partial likelihood used here) (supporting S1 Text , Table A4). These analyses also revealed that the risk of influenza infection can vary substantially over the time-course of an epidemic (supporting S1 Text , Table A5, Figures A8 and A9). In further sensitivity analyses, we refit the semi-parametric models constraining the highly-variable external seeding term ρ to 0. Under this nested model, estimates of the distance exponent γ were largest in the 2003/2004, 2005/2006 seasons and the 2009 pandemic, aligning with patterns of localized spread observed in maps of epidemic onsets ( Fig 2 ) and synchrony analyses ( Fig 4 ; supporting S1 Text , Table A8).

Interestingly, the importance of the normalization parameter (ϵ) is seen across all models, as estimates were generally close to 1 and always excluded 0 ( Table 3 ). This provides support for a density-independent infectious process, at least at the level of sampling in this dataset. For illustration, a map of the normalization effect indicates that a density-dependent transmission model (ε = 0) would tend to under-estimate influenza risk in cities with few close neighbors (supporting S1 Text , Figure A7). In other words, under the current sampling scheme, the least connected cities, generally located in the central part of the US, have a stronger risk of infection than their neighborhood alone would predict in the absence of normalization.

Spatial models based on geographic-distance outperformed those integrating mobility-indices in all seasons except for 2006/2007, where the work commutes-based model fit marginally better ( Table 2 ). Models incorporating work commutes systematically outperformed those using air traffic ( Table 2 , see also supporting S1 Text , Figure A5 for a comparison of connectivity between the two mobility proxies). Model estimates suggest that the risk of influenza transmission scales with work commutes according to a power law exponent between 0.6–0.8 (median = 0.67, Table 3 ). To better understand the differences between the models considered here, we explored the relationships between work commutes, air traffic and geographic distance (supporting S1 Text , Figure A6). We found that work commutes tend to be more localized than influenza transmission (distance power-laws of ~3.3 for work commutes and ~2.2 for influenza transmission, Fig 6 ), while air traffic did not scale with geographic distance (supporting S1 Text , Figure A6). Models relying on mobility proxies did not indicate a strong role for population size (μ) ( Table 3 ).

Estimates of μ, the dependence of a location’s susceptibility on its population size ranged from ~0.1–0.35, with a median value of 0.28 ( Table 3 ), indicating that more populated locations are at relatively higher risk for influenza transmission ( Fig 6 ). Parameter estimates for external seeding, ρ, varied several orders of magnitude between seasons, from ~0 in the 2005/2006 season to 0.98 in the 2004/2005 season. External seeding was weaker in seasons with particularly radial patterns in epidemic onsets—2002/2003, 2003/2004, 2005/2006. The large heterogeneity in estimates of ρ across seasons highlights the variable nature of long-distance transmission events in seeding epidemics during the seasons studied.

The number of cities available for analysis is based on the subset of 310 cities for which onset time could be estimated and which could be successfully matched with county-level commute data from the census. Spatial models were not fit to the 2008/2009 epidemic because of lack of detectable influenza onset times in a sufficient number of locations.

Left panel. Work commutes scale with geographic distance according to a power law of 3.3 (95% CI: 2.66 to 3.95); this relationship is depicted in blue (See also supporting S1 Text , Figure A6). Based on our distance-based transmission model, the force of infection between an infectious and susceptible city, λ ij , scales with geographic distance according to a median estimated power law of 2.2 (range: 2.08 to 2.65), depicted in red. This indicates that for a 10-fold decrease in distance between an infectious and susceptible pair of cities, the hazard of infection on the susceptible city increases by a factor of ~158 (range: 120–467). Right panel. The force of infection on a susceptible city, λ j , scales with its population size according to a median estimated power law of 0.28 (range: 0.15 to 0.35). This indicates that for a 10-fold increase in population size, the hazard of infection on the susceptible city increases by a factor of 1.9 (range: 1.4–2.2).

Overall, models using the geographic distance metric fit the data best across seasons ( Table 2 ). The range of power-law distance exponents was narrow ( = 2.1 to 2.7, Table 3 ), with a median value of 2.2, indicating a sharp decay in the risk of influenza infection between a susceptible-infectious pair of locations as the geographic distance between the pair increases ( Fig 6 ). The most extreme values of the distance exponent ( >2.5), were found in the 2006/2007 season and the 2009 pandemic. In the 2003/2004 season, the sharp decline in the risk of transmission as a function of pairwise distance ( = 2.2) indicates that the observed synchrony between the east and west coasts ( Fig 4 , Panel 2) is likely the result of radial spread from locations in the Southern and Midwestern US.

Next, we developed a mechanistic transmission model to identify essential drivers of the spatial process in each season, inspired by previous work on influenza [ 18 , 19 ] and the foot-and-mouth disease epidemic in the UK [ 22 ] (see methods ). The model captures the directionality of the infectious process, allowing infected locations to transmit disease to susceptible ones. The probability of transmission is a function of the geographic distance between susceptible-infectious pairs (power-law parameter, γ), and each location’s susceptibility is treated as a function of its population size (power-law parameter μ), the number and position of nearby locations (normalization parameter, ε) and external seeding (ρ). We also considered two alternative formulations of this model, where geographic distance is replaced by one of two proxies of human mobility based on work commutes or air travel.

We sought to quantify the number and locations of long-distance transmission events in each season, based on the minimum distance between a newly infected location and the set of previously infected locations. We generated an empirical distribution of this statistic across seasons that takes into account uneven geographic sampling, and defined events occurring in the upper 1 st percentile of this distribution as long-range transmission events (methods and supporting S1 Text , Figure A3). Between 1 and 5 such events were identified per season ( Fig 5 ); ~55% of these events were associated with increased influenza activity in adjacent areas within a two-week window (within a radius of ~570km). Long-range events generally occurred early in the epidemic, within the first ~10–30% of cities infected ( Fig 5 , supporting S1 Text , Table A2).

For each season, the purple circles indicate long-range transmission events (i.e. locations far from the set of infectious locations that obtain influenza regardless). Blue circles are locations infected within a two-week window from the last long-range event. Red circles depict the potential outbreak origin in each epidemic. Grey circles are infected locations before the first long-range event occurred in each season. Panels on the right indicate when in the epidemic long-range events occurred. Note that the methods used to identify long-range transmission events and estimate the origin of each epidemic are agnostic of one another; in 2008/2009 an early “long-range transmission event” appears near the estimated origin of the epidemic.

To better identify the initial focus of each outbreak, we adapted a previously developed method [ 6 ], where the most likely source location is identified as the location that maximizes the correlation between onset times and the geographic distance to the potential source (see methods ). Notably, of the eight seasons studied, seven were likely seeded in the Southern US ( Fig 5 and supporting S1 Text , Figure A4). The relationship between influenza timing and distance to the source location was particularly pronounced in four seasons (2002/2003, 2003/2004, 2005/2006 seasons and the 2009 pandemic, supporting S1 Text , Figure A4), echoing earlier synchrony results ( Fig 4 ). The distance signature generally subsided after 1000–1500 km, pointing to the importance of one or several long-distance transmission events effectively acting as new sources and obscuring further distance effects.

The y-axis in each panel is a normalized measure of pairwise synchrony in onset times between locations (values near -1 indicate that epidemics start close together in time, while values near +1 indicate a substantial lag in onset times). The x-axis is distance (kilometers). Each circle represents the mean pairwise synchrony for pairs of locations falling in 50-km distance bins. The black line segments are 95% confidence intervals for the mean in each bin. The red band is the expected pairwise synchrony under the null hypothesis of complete spatial randomness obtained by permutation of the onset times.

Maps of estimated influenza onset times reveal clear spatial patterns in all seasons ( Fig 3 and supporting S1 Text , Figure A2). Marked radial patterns are observed in at least four seasons: 2002/2003 (the hub of the epidemic is in the Southern US, see also supporting S1 Text , Figure A4), 2003/2004 (Southern US), 2005/2006 (Southwestern US) and the 2009 pandemic (Southeastern US). In order to quantify the role of distance on spatial spread, we plot pairwise synchrony in epidemic onsets as a function of geographic distance ( Fig 4 ). In all seasons except 2006/2007 and 2008/2009, pairwise synchrony decreased with distance up to 1500–2000 km and departed significantly from patterns expected under the null hypothesis of complete spatial randomness. This spatial signature was most pronounced in 2003/2004, 2005/2006 and the 2009 pandemic, in which between 30–58% of synchrony estimates were significantly different from the null ( Fig 4 ). In 2003/2004 there was a trend towards increasing synchrony between US coasts, as indicated by similar epidemic timings in pairs of locations separated by 2500–3000 km compared to pairs at intermediate distances.

In each panel, the x-axis represents weeks since June 1 st , with the corresponding calendar months labeled in orange. Each graph represents 100 possible epidemic curves generated by varying local onset times within estimated bounds of uncertainty. The tightness of the curves indicates that uncertainty in local onset times has little effect on the epidemic’s overall trajectory. Epidemic curves are colored according to the dominant subtype in circulation (black corresponds to A/H1N1 + B, blue to A/H3N2, red to A/H3N2 + B, and purple to A/H1N1pdm).

Fig 2 depicts the proportion of infected locations over time in each season, illustrating variability in both the epidemic period and the shape of the epidemic curves. The time for influenza to reach all US locations ranged from ~11–22 weeks, while the time for 90% of locations to be infected had a narrower, but still highly variable range (~5–11 weeks, based on the 5 th to 95 th percentiles, Table 1 ). Across seasons, onset times typically clustered between November and January, with two notable exceptions associated with major influenza seasons in 2003/2004 (Oct.-Nov.) and the 2009 pandemic (Aug.-Oct.). Epidemic curves were generally sigmoidal, indicating that the majority of transmission events happen over a narrow time interval relative to the entire length of the epidemic ( Fig 2 ). Notable exceptions were the 2005/2006 A/H3N2 season and the 2009 pandemic, both of which had a fairly linear and steady progression of cities becoming infected, and the 2006/2007 A/H1N1+B epidemic which appears to have two “waves” of transmission ( Fig 1 , Panel 2 and Fig 2 ).

Onset times for each season are depicted as red dashed lines. The 2003/2004 epidemic and 2009/2010 pandemic are highlighted in pink as these are major influenza seasons associated with antigenic novelty, where more than 99% of viruses in circulation were of the same subtype (A/H3N2 in 2003/2004 and A/H1N1 in 2009/2010). These major seasons had the most distinct influenza signals, while few locations experienced an increase in ILI during the mild 2008/2009 epidemic dominated by the A/H1N1 virus.

Our analysis builds on earlier work indicating that city-level medical insurance claims of influenza-like-illnesses (ILI) are useful and specific indicators of influenza virus activity across the US [ 21 ]. We extend previous analyses of the autumn wave of the 2009 A/H1N1 pandemic [ 19 ] to study spatial spread among 310 distinct geo-referenced locations across continental US during the 2002/03 through 2009/10 influenza seasons, covering both mild and severe epidemics as well as the 2009 pandemic ( Table 1 , supporting S1 Text , Table A1). Our spatial analysis relies on estimates of local influenza onsets, defined as the date associated with the winter breakpoint in ILI incidence for each location in each season (see methods , Fig 1 and supporting S1 Text , Fig A1). Depending on the strength of the influenza signal, we were able to accurately estimate the onset times in 135–306 locations in each season ( Table 1 ). First, we characterize the timing, origin, trajectory, spatial synchrony and long-range transmission events of each of the 8 epidemics. We subsequently fit power-law gravity models with and without explicit mobility indicators to the incidence data.

Discussion

Informed disease control and pandemic planning rely on understanding the mechanisms of and variability in the spatial transmission of influenza. Here we studied the spatial and temporal dynamics of annual influenza epidemics in the US over eight seasons, leveraging uniquely spatially-resolved medical claims data on outpatient influenza-like-illnesses (ILI) through an active research collaboration with a data-warehouse company. To our knowledge, this is the most detailed and comprehensive study of the city-level spread of influenza over multiple seasons to date, revealing a number of insights and generating hypotheses about the mechanisms of disease transmission.

Human mobility and the spatial spread of influenza While it is well accepted that international air travel plays a crucial role in the global dissemination of seasonal and pandemic influenza viruses [3,6,7,11], the role of air traffic in the regional spread of influenza remains debated [1,12–14]. Our study points to a predominantly localized mode of influenza transmission within the continental US, even though the exact pathways of epidemic spread and initial seeding events were variable across seasons. The estimated risk of transmission decays sharply with geographic distance from an infected location according to a power-law of ~2.2 (Range: 2.1–2.7). A particularly localized, radial spatial structure was observed in the 2003/2004 and 2005/2006 epidemics dominated by A/H3N2 viruses and, as reported previously [19], in the fall wave of the 2009 pandemic dominated by A/H1N1pdm. We extended previous models of influenza transmission by explicitly connecting disease data with human mobility proxies and found that models driven by geographic distance or work commutes systematically outperformed those driven by air traffic. Taken together, these results do not support domestic air traffic as the dominant mode of spatial dissemination of influenza in the US. We identified between one and five long-range transmission events per season, most occurring in the first third of each epidemic and half contributing successfully to the observed dynamics via secondary onward transmission. In model-based analyses, the estimated effect of external seeding varied more than 10-fold across seasons. Whether domestic or international air traffic is responsible for these long-range events and the initial seeding of the epidemic is an important area for future research, which would benefit from complementary analysis of phylogenetic data [5,11,23]. Previous work has indicated that work commutes are an important driver of the inter-state spread of influenza in the US [1,14]; here we show that at the scale of cities, models utilizing geographic distance outperformed those using work commutes. The range of connectivity implied by the work commute data may be too narrow to accurately capture influenza transmission, which occurs on a broader spatial scale, likely contributing to the better fit of distance-based spatial models (Fig 6). Further differences in model fit may stem from the substantially higher degree distribution of the geographic distance network compared to the between-city work-commute network, thereby allowing many more possible pathways for epidemic spread in distance-based models (see supporting S1 Text, Figure A5). Reassuringly, however, there was excellent internal consistency between parameter estimates for the work commute and distance-based models (see supporting S1 Text).

Demography and the spatial spread of influenza Prior work has indicated that influenza spread at the state-level in the US is dominated by hierarchical spread between populous states (perhaps driven by inter-state work commutes) [1], but this work was limited by low-resolution (state-level) mortality data and patterns were averaged across several decades. Our model-based analysis of higher resolution outpatient ILI visits indicates that while more populated locations are at relatively higher risk for influenza transmission compared to less populous locations, this effect is not strong enough for hierarchical spread to predominate over local spread. Indeed, a susceptible city’s population size contributed only marginally to its risk of obtaining influenza early in an epidemic. It is important to note that this estimated relationship may be confounded by differences in the age-structures of large and small populations, which are not accounted for in our model. In a supplementary analysis (supporting S1 Text, Figure A14) we demonstrate that more populous counties are enriched with persons between 20 and 50 years of age, a segment of the population with presumably higher rates of travel (both local and long-distance). Enrichment for this segment of the population in larger counties may confound the relationship between the risk of influenza infection and population size. In complementary analyses, we estimated spatial synchrony in ILI data using the approach proposed in Viboud et al. [1], both at the original resolution of cities and with data aggregated at the state-level. This sensitivity analysis suggests that the effects of workflows and demography are more pronounced at the state-level than at the city-level, validating earlier results [1] (supporting S1 Text, Table A6, Figures A10, A11). In contrast, the relationship between synchrony and geographic distance predominates at the scale of cities, indicating that spatial data aggregation likely biases the precise characterization of disease spread. The availability of finely spatially-resolved disease data is therefore essential to accurately capture the mechanisms of disease dissemination [24].

Absolute humidity and the spatial spread of influenza In contrast to previous work [15,16], our analysis did not find that absolute humidity was an important driver of the spatial spread of influenza. Absolute humidity did not affect the susceptibility of individual cities to influenza infection (supporting S1 Text, Table A3), though our models did not allow for spatial variation in the effect of absolute humidity. This finding is consistent with prior studies of the 2009 pandemic in the US [19] and Europe [25]. Furthermore, our analysis suggests seven of the eight epidemics studied likely originated in the Southern US, several in humid areas of Texas and Louisiana, a pattern that does not fit with the concept that low humidity favors influenza transmission [15]. Analyses of longer epidemiological records should shed light on whether the preferred Southern origin of epidemics is due to frequent seeding of influenza in the Southern US (potentially from Central and South America), or if demographic or geographic conditions favor early influenza activity in the South. Importantly, our study did not address the potential role of environmental factors on the local dynamics of influenza, particularly in the weeks immediately following the estimated onset times in each city.

Schools and the spatial spread of influenza Prior work on the 2009 pandemic has indicated a role for school openings in modulating the spatial structure of influenza spread during that season [19,26]. This was especially important in 2009 because (1) influenza transmission in the US occurred much earlier in the year (August-October) than for seasonal outbreaks, coinciding with the start of the fall school term, and (2) there is substantial geographic variation across start dates for US school fall terms, with earliest school openings in the Southeastern US. In contrast, seasonal influenza outbreaks tend to occur later in the year (onset dates Oct-Jan), typically well into the school fall term. While it is possible that the Christmas holiday break may modulate spatiotemporal patterns in seasonal influenza spread, the timing of this break is more uniform across the US and is not as likely to contribute to spatial variation in spread.

Effect of antigenic novelty on spatial spread of influenza Given the great interest in predicting how a novel influenza virus will spread through a largely susceptible population, two high transmission seasons merit special attention: the 2003/2004 drift-variant A/H3N2 epidemic and the fall wave of the 2009 pandemic (previously studied in [19]). Yang et al. [20] showed that the pre-season population-level susceptibility (~70%), effective reproductive number (1.4–1.6), and overall attack rates (24–33%) were similar for these two major influenza seasons, and in the higher range of seasonal influenza epidemics. We found that in both seasons, influenza virus circulation originated in the Southern US and began earlier in the year than is typical. The relatively slow spread (9–12 weeks) and strong spatial signature of these two seasons is surprising because theory suggests that more transmissible influenza viruses, associated with higher effective reproduction numbers, should spread faster and less radially (because long-distance introductions are more likely) than less transmissible viruses [1,19]. The net spatial effect of increased transmissibility is likely modulated by the mobility patterns of the most susceptible hosts, which may vary substantially with age and across seasons. Though a satisfying explanation for these findings remains elusive, the spatial diffusion of novel influenza viruses appears more localized than previously speculated and multiple introductions of influenza may be required to seed an epidemic in any given city, as the probability that a single introduction sparks an epidemic is low. Because commuting flows are on average one order of magnitude larger than air traffic volume [13], predominantly localized transmission is consistent with the idea that local connectivity (which includes work commutes) is the main driver of epidemic spread [1]. Finally, this finding may also highlight the importance of children in spreading influenza over short distances, an age group that also tends to be disproportionately impacted during times of novel influenza virus circulation [27,28]. We have not considered the role of human behavior change in shaping the spatiotemporal dynamics of epidemics [29], but we anticipate that the perception of risk during seasonal influenza epidemics is relatively consistent across seasons, and suspect this may play only a minor role in modulating spatial spread. During the 2009 pandemic, the US experienced a widely publicized “herald wave” of H1N1pdm transmission during the spring. By the fall, the US population was broadly aware that H1N1pdm infection was generally mild, likely reducing reactive behavior change during the fall wave of the pandemic. In our study, we model influenza epidemic onset times, rather than the full epidemic dynamics in each location, and is thus our approach is less sensitive to reactive behavior changes in response to growing case counts near the peak of the outbreak.

Modeling strategies and future developments Numerous strategies exist for modeling epidemic spread. Inspired by previous work on the foot-and mouth-disease epidemic in the UK [22], we employed semi-parametric transmission models to quantify the risk of infection as a function of demographic factors and different distance metrics across several seasons. Previous studies have used fully parametric models and have focused on a single epidemic with the goal of capturing the entire epidemic process [19,30–33]. We found that fully parametric models often require time-varying baseline hazards to capture the temporal dynamics of influenza epidemics (supporting S1 Text, Table A5, Figures A8 and A9). Accordingly, a constant baseline hazard was unable to fully capture the temporal dynamics of the 1918 epidemic, with model-predicted onsets times occurring earlier on average than observations [18]. For the 2009 pandemic, information on school openings was necessary to allow the baseline hazard for transmission to change over time (and space) [19]. In this context, the partial likelihood method [22] provides clear advantages in obtaining robust estimates for key spatial covariates while obviating the need to specify the form of the time-varying baseline hazard. In sensitivity analyses, we found that parameter estimates for the spatial component of the model were robust to the choice of baseline hazard in fully-parametric model formulations (supporting S1 Text, Table A4). One disadvantage of the semi-parametric method, however, is that it does not allow simulations of epidemic trajectories over time and cannot be applied for real-time predictions, as the baseline hazard is not fully specified. Our study is subject to several limitations. First, and most importantly, our study was limited to eight seasons, making our conclusions difficult to generalize broadly given year-to-year variability in circulating influenza strains. Second, although ~80% of the US population was eventually captured in this dataset, this percentage was lower in earlier years of data collection, and the Midwestern US was geographically under-sampled. Third, our analysis relies critically on the accurate estimation of epidemic onsets in each location, which is particularly challenging during mild epidemics because the influenza signal is weaker. To address this issue, we (1) analyzed data only from locations with sufficiently robust influenza signals, (2) quantified the uncertainty in epidemic onset estimates, and (3) computed onset times using more “traditional” methods (sinusoidal regression models), which showed excellent agreement with our estimates (supporting S1 Text, Figure A12). Further, we have shown that the main results of our study are robust to moderate uncertainties in the epidemic onset times (supporting S1 Text, Figure A13). Fourth, our dataset is based on ILI records rather than laboratory-confirmed influenza cases. Though outpatient ILI captures the aggregate effects of a number of respiratory pathogens, previous work has shown strong agreement with laboratory-based indicators of influenza virus activity at the regional and city level [21]. Furthermore, our analysis is focused on influenza epidemic onset times, which are identified using a linear-spline method separating background ILI rates from rapid case growth due to influenza activity. This approach controls for the effect of other respiratory pathogens in the peri-epidemic period. Another caveat relates to the co-circulation of several influenza virus subtypes or strains in some winters. Syndromic (ILI) time series capture the aggregate effects of these viruses, making it challenging to disentangle the transmission dynamics of one virus from another. We speculate, however, that co-circulation would superpose multiple spatial patterns thereby biasing analyses towards spatial randomness; reassuringly, we observed a spatial signature during epidemics in which multiple subtypes co-circulated (2002/2003, 2004/2005, and 2005/2006). Importantly, this caveat does not apply to the severe and radially diffusive influenza seasons in 2003/2004 and 2009, which were marked by overwhelming predominance of a single antigenically-novel strain. We relied on gravity-type power-law models of influenza transmission and incorporated empirical data on human mobility; these models have been validated as parsimonious approximations of both human movement [1,34] and influenza spread [13,18,19]. A major limitation of these models, however, is that they consider the pairwise interaction between infectious/susceptible cities, but ignore higher order interactions, which may prove to be essential in capturing realistic patterns of human movement. In reality, some combination of work commutes, air travel and other local movements likely drive disease dissemination, and ideally these different connectivity metrics should be integrated in a single model rather than analyzed separately. Because the observed mobility networks are captured on different scales both geographically and temporally, it is not clear how best to combine information across such different networks [35,36]. Finally, the observed data represent a sample of locations within the network of interacting cities; the effects of geographic sampling on parameter estimation is poorly understood. Further work is needed to develop realistic model formulations that connect multiple mobility indices and disease datasets at appropriate temporal and spatial scales. In line with previous work on the spatial spread of influenza [19] our analysis focuses on epidemic establishment in each location under study, rather than the time of influenza introduction. Epidemic establishment is more complex than the mere importation of influenza cases, as it requires sustained chains of local transmission, which may depend on climatic or virus-specific factors, among others. In our study, the unit of analysis is the city, and only after establishment of sustained local transmission in each city are there sufficient numbers of cases to allow for influenza spread to other cities, as built into the model framework utilized here. The frequency of viral introductions and specific viral migration pathways would be interesting to study but are best addressed by phylogeographic analyses, assuming appropriate sampling of viral sequences early in the epidemic [5].