Internet search data may offer new possibilities to improve forecasts of collective behavior, if we can identify which parts of these gigantic search datasets are relevant. We introduce an automated method that uses data from Google and Wikipedia to identify relevant topics in search data before large events. Using stock market moves as a case study, our method successfully identifies historical links between searches related to business and politics and subsequent stock market moves. We find that the predictive value of these search terms has recently diminished, potentially reflecting increasing incorporation of Internet data into automated trading strategies. We suggest that extensions of these analyses could help draw links between search data and a range of other collective actions.

Abstract

Technology is becoming deeply interwoven into the fabric of society. The Internet has become a central source of information for many people when making day-to-day decisions. Here, we present a method to mine the vast data Internet users create when searching for information online, to identify topics of interest before stock market moves. In an analysis of historic data from 2004 until 2012, we draw on records from the search engine Google and online encyclopedia Wikipedia as well as judgments from the service Amazon Mechanical Turk. We find evidence of links between Internet searches relating to politics or business and subsequent stock market moves. In particular, we find that an increase in search volume for these topics tends to precede stock market falls. We suggest that extensions of these analyses could offer insight into large-scale information flow before a range of real-world events.