Could social media data aid in disaster response and damage assessment? Countries face both an increasing frequency and an increasing intensity of natural disasters resulting from climate change. During such events, citizens turn to social media platforms for disaster-related communication and information. Social media improves situational awareness, facilitates dissemination of emergency information, enables early warning systems, and helps coordinate relief efforts. In addition, the spatiotemporal distribution of disaster-related messages helps with the real-time monitoring and assessment of the disaster itself. We present a multiscale analysis of Twitter activity before, during, and after Hurricane Sandy. We examine the online response of 50 metropolitan areas of the United States and find a strong relationship between proximity to Sandy’s path and hurricane-related social media activity. We show that real and perceived threats, together with physical disaster effects, are directly observable through the intensity and composition of Twitter’s message stream. We demonstrate that per-capita Twitter activity strongly correlates with the per-capita economic damage inflicted by the hurricane. We verify our findings for a wide range of disasters and suggest that massive online social networks can be used for rapid assessment of damage caused by a large-scale disaster.

Keywords

Here, we present a hierarchical multiscale analysis of disaster-related Twitter activity. We start at the national level and progressively use a finer spatial resolution of counties and zip code tabulation areas (ZCTAs). First, we examine how geographical and sociocultural differences across the United States manifest through Twitter activity during a large-scale natural disaster (that is, Hurricane Sandy). We investigate the response of cities to the hurricane and identify general features of disaster-related behavior at the community level. Second, we study the distribution of geo-located messages at the state level within the two most affected states (New Jersey and New York) and, for the first time, analyze the relationship between Twitter activity and the ex-post assessment of damage inflicted by the hurricane. We verify the external validity of our findings across 12 other disaster events.

More recently, researchers have begun using social media platforms to derive information about disaster events themselves. For instance, the number of photographs uploaded to Flickr was shown to correlate strongly with physical variables that characterize natural disasters (atmospheric pressure during Hurricane Sandy) ( 48 ). Although it is unclear what causes the link (external information, network effects, or direct observer effects), the correlation suggests that digital traces of a disaster can help measure its strength or impact. On the basis of a similar concept, other studies verify the link between the spatiotemporal distribution of tweets and the physical extent of floods ( 49 ) and the link between the prevalence of disaster-related tweets and the distribution of Hurricane Sandy damage predicted from modeling ( 50 ).

Existing research on the use of Twitter in an emergency context is manifold. Researchers study platform-specific features (retweets and private messages) of emergency information diffusion ( 31 , 32 ), the role of the service in gathering and disseminating news ( 33 , 34 ), its contribution to situational awareness ( 35 , 36 ), and the adoption of social media by formal respondents to serve public demand for crisis-related information ( 37 , 38 ). Another branch focuses on the practical aspects of classifying disaster messages, detecting events, and identifying messages from crisis regions ( 39 – 43 ). Others use Twitter’s network properties to devise sensor techniques for early awareness ( 44 ), to gauge thedynamics of societal response ( 45 , 46 ), and to crowdsource relief efforts ( 47 ).

Because of the potential of social media, the use of massive online social networks in disaster management has attracted significant public and research interest ( 24 – 26 ). In particular, the microblogging platform Twitter has been especially useful during emergency events ( 27 – 29 ). Twitter allows its users to share short 140-character messages and to follow public messages from any other registered user. Such openness leads to a network topology characterized by a large number of accounts followed by an average user, placing Twitter somewhere in between a purely social network and a purely informational network ( 30 ). The information network properties of Twitter facilitate and accelerate the global spread of information; its social network properties ease access to geographically and personally relevant information, and the message length limit encourages informative exchange. These factors combine to make Twitter especially well suited for a fast-paced emergency environment.

As society faces this need, the use of social media on platforms like Facebook and Twitter is on the rise. Unlike traditional media, these platforms enable data collection on an unprecedented scale, documenting public reaction to events unfolding in both virtual and physical worlds. This makes social media platforms attractive large-scale laboratories for social science research ( 9 – 11 ). Opportunities provided by social media are used in various domains, including the economic ( 12 ), political ( 13 – 16 ), and social ( 14 , 17 – 21 ) sciences, as well as in public health ( 22 , 23 ).

Natural disasters are costly. They are costly in terms of property, political stability, and lives lost ( 1 – 3 ). Unfortunately, as a result of climate change, natural disasters, such as hurricanes, floods, and tornadoes, are also likely to become more common, more intense, and subsequently more costly in the future ( 4 – 7 ). Developing rapid response tools that are designed to aid in adapting to these forthcoming changes is critical ( 8 ).

RESULTS

Context of the study Hurricane Sandy was the largest hurricane of the 2012 season and one of the costliest disasters in the history of the United States. Sandy was a late-season hurricane that formed on 22 October 2012 southwest of Jamaica, peaked in strength as a Category 3 hurricane over Cuba, passed the Bahamas, and continued to grow in size while moving northeast along the United States coast. The hurricane made its landfall on the continental United States at 23:30 UTC on 29 October 2012 near Brigantine, NJ, with winds reaching 70 knots and with the storm surge reaching as high as 3.85 m. According to the National Hurricane Center (51), Sandy caused 147 direct fatalities and is responsible for damage in excess of $50 billion, including 650,000 destroyed or damaged buildings and more than 8.5 million people left without power—some of them for weeks. Both broadcast and online media extensively covered Hurricane Sandy, generating a large volume of Twitter messages that became the basis for this study. Our raw data include hurricane-related messages (see table S1 for hurricane-related keywords and Materials and Methods for description of data) posted between 15 October and 12 November 2012, in a period that precedes the formation of the hurricane and extends beyond its dissipation. In total, we have 52.55 million messages from 13.75 million unique users. Because we are interested in a spatiotemporal analysis of Twitter activity, we focus exclusively on messages and users with known locations, which limits the data to 9.7 million geo-coded tweets from 2.2 million unique user accounts. We perform the analysis at the national and state levels. At the national level, we use cities as a natural (in terms of spatial extent and population size) basis for aggregation and comparison. Cities are important because of their dominant (52, 53) and increasing (54, 55) socioeconomic role in all aspects of human life (56–58), both in the real world and online. In addition, similarities or differences in the way cities react to a major natural disaster, like Sandy, are of interest to social scientists and climate adaptation policy-makers alike (8, 59). Our analysis covers the 50 most populous urban areas according to the 2010 U.S. Census. At the state level, we progressively use a finer spatial resolution of counties and ZCTAs to analyze the local distributions of Twitter activity and hurricane damage. At every level of spatial resolution, we aggregate messages that have latitude and longitude falling within the boundaries of a respective region of interest (metropolitan area, county, or ZCTA). We use boundaries and population estimates of all administrative areas as determined by the 2010 U.S. Census. After aggregating the tweets by location, we use time stamps for temporal analysis. We allocate messages into nonoverlapping bins of 24-hour duration aligned with the time of minimum activity. Comparison metrics include the total number of active users, number of messages posted, classification of these messages into original and retweeted messages (including identification of the source as local or external to a particular community), and sentiment. Because the number of tweets originating from different urban or zip code areas varies greatly, we compare characteristics as normalized by the total count of distinct users for each area who are active during the data collection period. For consistency, each keyword is considered separately, and normalization uses the count of users engaged in the activity on a particular topic to avoid the bias that may arise because of the different sets of prevalent topics in different cities.

Dynamics of Twitter activity across regions and hurricane-related topics The messages studied here cover a range of keywords with varying relevance to Hurricane Sandy. Because of this, we deal with three dimensions in our analysis: spatial, temporal, and topical. Figure 1 illustrates some of the characteristic features of Twitter activity. The pattern demonstrated by keywords strongly related to the hurricane (“sandy,” “storm,” “hurricane,” “frankenstorm,” etc.) is shown in Fig. 1A: the number of messages slowly increases with a strong peak on the day of hurricane landfall, followed by a gradual decline in the tweet activity level. Geographically, the trend is similar almost everywhere, but the magnitude of the normalized response changes depending on the proximity to the hurricane, determined through the shortest distance to the path of the hurricane (60). Fig. 1 Example of the spatiotemporal evolution of Twitter activity across keywords. (A) Geographical and topical variation of normalized activity (the number of daily messages divided by the number of local users active on the topic during the observation period). The horizontal axis is an offset (in hours) with respect to the time of hurricane landfall (00:00 UTC on 30 October 2012). Activity on hurricane-related words like “sandy” increases and reaches its peak on the day of landfall and then gradually falls off. Qualitatively similar trends are observed everywhere, with distance to the path of the hurricane affecting the strength of the response (compare magnitudes of activity peaks between New York, Chicago, and Miami). Different temporal patterns are exhibited by different keywords: “gas”-related discussion peaks with delay corresponding to posthurricane fuel shortages, and activity on “storm” has a secondary spike attributable to November “Nor’easter” storm. (B) Summary of activities by topic and location. Color corresponds to the level of normalized activity (blue, low; red, high). In columns, places are ranked according to their proximity to the path of the hurricane (closest on the left; farthest on the right). In rows, words are ranked according to the average activity on the topic. Evolution of the event brings disaster-related words to the top of the agenda, with the northeast showing the highest level of activity. An alternative way to summarize the activity is shown in Fig. 1B, where the normalized activity is presented as a two-dimensional heatmap. We rank cities by their proximity to the hurricane, and we rank words by the average normalized activity. At the peak of the disaster, event-related keywords rank higher and activity increases with proximity. Consequently, we see that the upper-left corner of our city/topic matrix shows a high level of activity. In summary, as the disaster approaches and peaks in intensity, so does the normalized local Twitter response. In addition, the content of the message stream changes, and keywords most associated with the event dominate the agenda. When we aggregate our data over the period between 20 October and 12 November 2012, we find that tweet activity declines with increasing distance from the hurricane path up to 1500 km and is nearly constant for all places farther away. These features are summarized in Fig. 2A and fig. S1 (for all keywords). This relationship between proximity and activity level is a dominant feature, accompanied by two other relationships. The first one is an inverse relationship between activity on the topic and originality of the content expressed through the fraction of retweets, which reflects the balance between content creation and consumption. The areas directly hit by, or close to, the disaster show a lower ratio of retweets (more original content) in the stream of messages generated, as can be seen in Fig. 2B and fig. S2. The second relationship is between the activity and the global popularity of local messages (defined as the count of messages that get retweeted, normalized by the local user count), with content from affected areas attracting higher attention elsewhere, as shown in Fig. 2C and fig. S3. The activity-popularity relationship (and, to a lesser degree, activity-originality relationship) is very strong for the event-related keywords but virtually absent for neutral or more general keywords. We illustrate this in Fig. 2 inset plots for the keyword “weather”—a general word that is used frequently and is not necessarily associated with extreme weather events, even when such events take place. Fig. 2 Characteristic features of Twitter activity across locations (labeled by color according to hurricane proximity; blue, farther from the disaster; red, closer to the disaster). In all panels, the primary plot shows results for messages with keyword “sandy” and an inset for keyword “weather” to contrast behaviors between event-related and neutral words. (A) A primary feature is the sharp decline in normalized activity as the distance between a location and the path of the hurricane increases. After the distance exceeds 1200 to 1500 km, its effect on the strength of response disappears. This trend may be caused by a combination of factors, with direct observation of disaster effects and perception of risk both increasing the tweet activity of the East Coast cities. Anxiety, anticipation, and risk perception evidently contribute to the magnitude of response because many of the communities falling into the decreasing trend were not directly hit or were affected only marginally, whereas New Orleans, for example, shows a significant tweeting level that reflects its historical experience with damaging hurricanes like Katrina. (B) The retweet rate is inversely related to activity, with affected areas producing more original content. (C) The popularity of the content created in the disaster area is also higher and therefore increases with activity as well. None of the features discussed above are present for neutral words (see the insets in all panels). The direct relationship between online activity and proximity to the hurricane naturally raises the question of factors that stimulate such an activity. Is it extensive media coverage, perception of risk, or witnessing the hurricane’s meteorological effects (winds, precipitation, and storm surge) and damage (power and fuel shortages, flooding, loss of personal property, and casualties)? The latter, especially the extent to which quantifiable properties of online activity (recorded during and shortly after the disaster) reflect the severity of disaster-related damage, is especially interesting from the point of view of disaster management. Real-time analysis of online activity as a predictor of damage would be a valuable tool for optimizing the allocation of limited emergency and recovery resources, and may complement other predictive models used in the joint assessment and recovery of damaged infrastructures (61). Therefore, we investigate whether damage to property across the most severely hurricane-affected regions correlates with the recorded Twitter activity.

Damage assessment: Hurricane Sandy Because the hurricane damage was mostly confined to several states, we perform damage analysis at finer spatial granularity by looking at counties and ZCTAs. We examine both aggregation levels to determine the limits of spatial resolution achievable with such a technique. Two primary data sources contribute to our estimate of damage. The first data source are Federal Emergency Management Agency (FEMA) household assistance grants to homeowners and renters (62). These grants are provided to relieve the hardship of households exposed to disasters and to enable bringing the original property back to a habitable condition. The second data source are insurance claims associated with Hurricane Sandy (63, 64), including National Flood Insurance, residential, commercial, vehicle, and marine insurance claims. We use these indicators because both are expressed in monetary terms and are reported by individuals, rather than by administrative entities like municipalities. A more holistic index of community hardship [like the one by Halpin (65)] could be developed, taking into account other metrics: the number of people served in shelters, effects of power loss (using as proxy the number of days schools were closed), gas shortages (the number of calls to the State Emergency Hotline from gas stations), and FEMA public assistance grants to help with municipal infrastructure. Although such methodology gives a broader picture of the hardship on the ground, the metrics involved do not have a standard way of measurement and do not share a common unit to be integrated together. To avoid this ambiguity, we only include the data reported by individuals and measured directly as monetary loss. We analyze the damage estimates, aggregated within either counties or ZCTAs, against Twitter activity in the same boundaries. The available data on damage allow us to look at several aspects, including the total damage claimed, the total damage covered by FEMA and insurance, the number of applications and successful applications, and severity categories based on the cost. We look at the relationship between normalized quantities (per-capita Twitter message count and per-capita damage) to avoid correlations artificially induced by population counts (more populous areas produce higher message counts and experience greater damage). To determine whether activity quantitatively reflects the severity of the disaster, we test the independence of two distributions: activity versus damage. We consider activity on the core set of messages strongly associated with the hurricane (see table S2 for the rankings and table S3 for the results across all keywords). The estimate of damage is a snapshot from November 2014, whereas activity varies significantly over the data collection period. In the interest of capturing predictive capacity, and in a practical attempt to determine the best analysis window to get the strongest predictive effect, we calculate the correlations on a daily basis between 22 October and 12 November. In addition to examining the activity-damage correlation, we also check the sentiment-damage correlation. Previous studies (44) suggested that a drop in the average sentiment in an area may indicate an emergency, and we aim to verify whether the sentiment also serves as a quantitative predictor of damage. Correlation coefficient dynamics is presented in Fig. 3. Because we discard inactive areas (ZCTAs with no messages posted during an analysis period), the length of vectors subject to an independence test varies over time, and we chose to discard correlation coefficients earlier than 22 October and later than 11 November. Within this period, we have, on average, hundreds of active ZCTAs (see Fig. 3A). Figure 3B shows that the rank correlation coefficients are moderately positive, indicating a weak correlation. Fig. 3 Predictive capacity of Hurricane Sandy’s digital traces. The horizontal axis is an offset (in hours) with respect to the time of hurricane landfall (00:00 UTC on 30 October 2012). (A) The number of messages as a function of time (labeled on the secondary y axis on the right) and the number of “active” (with at least one message posted) ZCTAs (labeled on the primary y axis on the left). (B) Evolution of the rank correlation coefficients between the normalized per-capita activity (number of original messages divided by the population of a corresponding ZCTA) and per-capita damage (composed of FEMA individual assistance grants and Sandy-related insurance claims). In addition, the dashed trend shows Kendall rank correlations between average sentiment and per-capita damage. The correlation increases from the prelandfall stage to the postlandfall stage of the hurricane, with a drop on the day of hurricane landfall. We conclude that the postdisaster stage, or persistent activity on the topic in the immediate aftermath of an event, is a good predictor of damage inflicted locally. The strength of the average sentiment of tweets does not seem to be a good predictor, at least at this level of spatial granularity (ZCTA resolution). The correlation is present for several days before the landfall, which might reflect a priori knowledge of local hurricane vulnerability based on historical experience within particular areas and obvious risk factors such as proximity to the shoreline. This positive correlation decreases on the day of landfall across all correlation measures. Despite the highest total count of messages, the peak of the disaster has the weakest damage-predictive power. However, in the following 2 days, the activity-damage correlation steadily increases. From the third day onward, it fluctuates around a moderate level (Kendall τ = 0.25 to 0.3; Spearman ρ = 0.35 to 0.45). We examined these trends by combining all data, as well as by examining different keywords separately, without much of a difference in the pattern or magnitude of coefficients. Arguably, this trend (a drop on the day of the hurricane, followed by a steady increase in the relationship between activity and damage) could be explained by the universally high tweet activity on the day of hurricane landfall, fueled not only by the severity of the storm but also by the widespread coverage of the hurricane in all forms of media. In places that were spared significant consequences of the hurricane, the interest of the public quickly diminishes. However, in affected areas, the topic persistently remains at the top of the agenda, making postevent activity an indicator of the damage caused by the hurricane. Focusing on the period in which the relationship between activity and damage is strongest (between 31 October and 12 November), we measure rank correlation coefficients for all ZCTAs in New Jersey and for selected counties in New Jersey and New York. Results are summarized in Fig. 4 and fig. S4. ZCTA-based distributions of per-capita activity and per-capita damage are approximately log-normal, with histograms shown in Fig. 4A. The Kendall rank correlation reaches 0.39, the Spearman rank correlation reaches 0.55, and the Pearson correlation coefficient approaches 0.6. Analysis by county (Fig. 4B) reveals similar results: Kendall τ = 0.34, Spearman ρ = 0.49, and Pearson ρ = 0.49 for 34 counties across New Jersey and New York. All measures are statistically significant with P < 0.05, indicating a moderate positive correlation between damage and tweet activity. Spatial distributions confirm the relationship, with a pronounced concentration of both damage and normalized activity along the coastline of New Jersey. Alternative normalization (by Twitter user count instead of actual population) does not alter the strength of the correlation (see table S4). Using geo-enriched data instead of natively geo-coded data produces similar results, with ZCTA-level analysis giving a slightly weaker correlation but with county-based analysis unaffected (see fig. S5). Fig. 4 Spatial distributions and mutual correlations between Hurricane Sandy damage, Twitter activity, and average sentiment of tweets. Correlations between per-capita Twitter activity and damage are illustrated at the ZCTA level for New Jersey (A) and at the county level for New Jersey and New York (B). The difference in geographic coverage is dictated by the quality of data: no insurance data are available for New York at the ZCTA level. Spatial distributions show that both variables reach their highest levels along the coast and in densely populated metropolitan areas around New York City. Normalized activity and damage both follow a quasi log-normal distribution [see the histograms along the axes of the scatter plot in (A)]. A moderately strong positive correlation between postlandfall activity and damage is observed, especially for fine-resolution analysis [see inset tables in the scatter plots in (A) and (B) for exact statistics and P values]. Sentiment-versus-damage (S-D) analysis is underpowered at the ZCTA level (τ = −0.031, P = 0.29), but county-level analysis shows that negative sentiment correlates with damage (τ = −0.28, P = 0.018). Following Guan and Chen (50), we also analyze the relationship between Twitter activity and damage estimates produced by the FEMA Modeling Task Force (based on the Hazus-MH model of hurricane wind and storm surge damage to housing and infrastructure). This approach results in somewhat weaker correlations (Kendall τ = 0.28, Spearman ρ = 0.44, and Pearson ρ = 0.33), suggesting that online response better reflects the actual damage (ex-post assessment instead of modeling predictions). Comparison of alternative damage estimates and their effects on the strength of the observed activity-damage correlation is summarized in tables S5 and S6. Our previous study (44) suggested that the negative average sentiment may indicate an emergency situation based on the fact that the sentiment experiences a drop for a sustained period of time before and after the landfall of Hurricane Sandy. Here, we reexamine the sentiment-damage relationship and find that daily ranking correlation coefficients oscillate around zero for the entire observation period (see Fig. 3B). Within the most favorable prediction window (31 October to 12 November), Kendall τ = −0.031 (P = 0.294), suggesting independence of the underlying distributions or that analysis at ZCTA resolution is underpowered. Change in spatial resolution from ZCTAs to counties results in a more definitive relationship (τ = −0.28, P = 0.018), and normalization by Twitter user count yields more significant results (τ = −0.34, P = 0.005), confirming our previous findings and making sentiment weakly predictive of damage (see table S7 for the summary of results).