Our study showed that the relationship between heat stroke internet searches and health outcomes is much higher than that between temperature and health outcomes. The heat stroke internet searches had better predictive ability for health risk than temperature during the heatwave. The temperature and health epidemiological studies show the lag effect of mortality, is in agreement with previous studies15,26. Our study also shows a significant correlation between heat stroke cases and heat stroke deaths. This indicates that morbidity surveillance data could be a good indicator for mortality during a heatwave. Hence, the identification and prevention of morbidity during a heatwave can be expected to decrease the mortality rate.

A new method of syndromic surveillance has emerged8,22,27,28 which is known as web-based keyword searching. This has been shown to be a feasible surveillance method for influenza, kidney stone disease, dengue and other conditions. Heat stroke is a widely known effect of heatwaves and is associated with easily recognizable heatwave related syndromic keywords in search engines. The consistency of both factors in our study is much higher than in previous studies of other diseases and internet keyword searching8,28,29,30. Therefore, the heat stroke internet search is well suited for use as a new heatwave surveillance health outcome proxy. In 2013, the internet search based Google Flu Trends was predicting more than double the proportion of doctor visits for influenza like illnesses than the Center for Disease Control and Prevention (CDC)31. Lazer et al. explored the “big data hubris” and algorithm dynamics that contributed to Google Flu Trends’ mistakes31. The problem was mainly due to a mismatch between number of searches, about 50 million, and the small volume of surveillance data of 1152 cases31. In our study, the number of searches is about 450 per day and the number of heat stroke cases and deaths is about 8 and 1 per day respectively. The search index is at most several hundred times the number of health outcomes, which is much smaller than that of Google Flu Trends. Google Flu Trends also used an algorithm to capture the dynamics of the cases31. However, in our analysis, the search index value was directly compared with heat stroke health outcomes, which capture the variation in cases and deaths very well. Therefore, heat stroke searches can be expected to largely avoid the two issues that led to Google Flu Trends prediction errors. A recent study of Google Flu Trends showed that using the aggregate frequency of selected queries as the only predictor could lead the prediction errors32. We also aggregated all queries about “heat stroke” in a single predictor, hence our indicator may also be prone to similar errors. However, the purpose of this study is not to build a predictive model for heat wave related health outcomes. We identify this potential weakness for further studies to address.

Recently the use of web based searching for disease identification has been gaining interest and mining the web is a valuable new direction to quickly identify diseases and epidemics8,24,29,33. Our study is the first to show that web based searching could be useful for predicting risks to health during heatwaves and provides a new heatwave health warning system. This offers the prospect that stakeholders could recognize the heatwave health epidemic in a timely and cost effective way, which could translate into a practical and rapid health protection response.

It is however also important to note the limitations of this new syndromic surveillance tool. Demographic information on users is not available. Therefore, it is not possible to identify the most vulnerable population during the heatwave. In addition, the precise reason for users searching for terms is not clear. The keyword “heat stroke” is not the only one that could be used and the effectiveness of alternative keywords will be investigated in further research. There are also many confounding factors that could not be modeled. It is noteworthy that publicity due to a health awareness campaign or items from the media can affect internet searches23. This could be addressed using a web-based database to monitor news items, which could be used to adjust the model. In Shanghai about 30% of people do not have access to the internet in 201334 and it is possible that this group includes a disproportionate number of older people who are especially vulnerable to the effects of heatwaves. This may be another source of uncertainty in using internet searches for heatwave surveillance data. As a result of data limitations, we have only been able to test the internet searches during heatwaves in one location during one summer period. Further testing of the relationship between “heat stroke” internet searches and health outcomes during heatwaves in other locations in different heatwave periods is anticipated in the future.

Internet searches are easily accessible and economical, which could be of benefit for the early warning of health risks and taking preventive measures during the heat wave period. A unique strength of internet searches is the immediacy of access to the data, which provides the basis for an alternative real-time health surveillance system. Compared with traditional surveillance data, internet searches could be used for recognizing the onset of epidemics more quickly and producing real-time health warnings during heatwaves. Heatwaves are a global public health concern and will be more frequent and severe in a changing climate. However, health surveillance during heatwaves and heatwave health warning systems are very rare around the world. Our study shows that heat stroke internet searching could form a new tool for confronting the challenge of heatwaves worldwide. Nowadays, with the rapid development of the internet, web searching will become more accurate and representative of the whole population. It could be used in different areas, especially in regions with no health surveillance records.