We analyze the performance of a set of 98 search terms. We included terms related to the concept of stock markets, with some terms suggested by the Google Sets service, a tool which identifies semantically related keywords. The set of terms used was therefore not arbitrarily chosen, as we intentionally introduced some financial bias. We explain our strategy based on changes in search volume with reference to the term debt, a keyword with an obvious semantic connection to the most recent financial crisis and overall the term which performed best in our analyses.

To uncover the relationship between the volume of search queries for a specific term and the overall direction of trader decisions, we analyze closing prices p(t) of the Dow Jones Industrial Average (DJIA) on the first trading day of week t. We use Google Trends to determine how many searches n(t – 1) have been carried out for a specific search term such as debt in week t – 1, where Google defines weeks as ending on a Sunday, relative to the total number of searches carried out on Google during that time. We find that search volume data change slightly over time due to Google's extraction procedure. For each search term, we therefore average over three realizations of its search volume time series, based on three independent data requests in consecutive weeks. The variability of Google Trends data across different dates of access is irrelevant for our results and it can be shown that the data are consistent with reported real world events (see Fig. S1 in the Supplementary Information).

To quantify changes in information gathering behavior, we use the relative change in search volume: Δn(t, Δt) = n(t) − N(t − 1, Δt) with N(t − 1, Δt) = (n(t − 1) + n(t − 2) + … + n(t − Δt))/Δt, where t is measured in units of weeks. In Fig. 1, we depict relative search volume changes for the term debt and their relationship to DJIA closing prices.

Figure 1 Search volume data and stock market moves. Time series of closing prices p(t) of the Dow Jones Industrial Average (DJIA) on the first day of trading in each week t covering the period from 5 January 2004 until 22 February 2011. The color code corresponds to the relative search volume changes for the search term debt, with Δt = 3 weeks. Search volume data are restricted to requests of users localized in the United States of America. Full size image

To investigate whether changes in information gathering behavior as captured by Google Trends data were related to later changes in stock price in the period between 2004–2011, we implement a hypothetical investment strategy for a portfolio using search volume data, called ‘Google Trends strategy' in the following. Profit can only be made in a trading strategy if at least some future changes in the stock price are correctly anticipated, in particular around large market movements. We implement this strategy by selling the DJIA at the closing price p(t) on the first trading day of week t, if Δn(t − 1, Δt) > 0 and buying the DJIA at price p(t + 1) at the end of the first trading day of the following week. Note that mechanisms exist which make it possible to sell assets in financial markets without first owning them. If instead Δn(t − 1, Δt) < 0, then we buy the DJIA at the closing price p(t) on the first trading day of week t and sell the DJIA at price p(t + 1) at the end of the first trading day of the coming week. At the beginning of trading, we set the value of all portfolios to an arbitrary value of 1. If we take a ‘short position’—selling at the closing price p(t) and buying back at price p(t + 1)—then the cumulative return R changes by log(p(t)) − log(p(t + 1)). If we take a ‘long position’—buying at the closing price p(t) and selling at price p(t + 1)—then the cumulative return R changes by log(p(t + 1)) − log(p(t)). In this way, buy and sell actions have symmetric impacts on the cumulative return R of a strategy's portfolio. In using this approach to analyze the relationship between Google search volume and stock market movements, we neglect transaction fees, since the maximum number of transactions per year when using our strategy is only 104, allowing a closing and an opening transaction per week. We of course do not dispute that such transaction fees would impact profit in a real world implementation.

In Fig. 2, the performance of the Google Trends strategy based on the search term debt is depicted by a blue line, whereas dashed lines indicate the standard deviation of the cumulative return from a strategy in which we buy and sell the market index in an uncorrelated, random manner (‘random investment strategy’). The standard deviation is derived from simulations of 10,000 independent realizations of the random investment strategy. Fig. 2 shows that the use of the Google Trends strategy, based on the search term debt and Δt = 3 weeks, would have increased the value of a portfolio by 326%. The performance of Google Trends strategies based on all other search terms that we analyze is depicted in Figures S3-S100 in the Supplementary Information.

Figure 2 Cumulative performance of an investment strategy based on Google Trends data. Profit and loss for an investment strategy based on the volume of the search term debt, the best performing keyword in our analysis, with Δt = 3 weeks, plotted as a function of time (blue line). This is compared to the “buy and hold” strategy (red line) and the standard deviation of 10,000 simulations using a purely random investment strategy (dashed lines). The Google Trends strategy using the search volume of the term debt would have yielded a profit of 326%. Full size image

We rank the full list of the 98 investigated search terms by their trading performance when using search data for U.S. users only (Fig. 3A) and when using globally generated search volume (Fig. 3B). In order to ensure the robustness of our results, the overall performance of a strategy based on a given search term is determined as the mean value over the six returns obtained for Δt = 1...6 weeks. Returns of the strategies are calculated as the logarithm of relative portfolio changes, following the usual definition of returns. The distribution of final portfolio values resulting from the random investment strategies is close to log-normal. Cumulative returns from the random investment strategy, derived from the logarithm of these portfolio values, therefore follow a normal distribution, with a mean value of <R> RandomStrategy = 0. Here we report R, the cumulative returns of a strategy, in standard deviations of the cumulative returns of these uncorrelated random investment strategies.

Figure 3 Performances of investment strategies based on search volume data. (A) Cumulative returns of 98 investment strategies based on search volumes restricted to search requests of users located in the United States for different search terms, displayed for the entire time period of our study from 5 January 2004 until 22 February 2011—the time period for which Google Trends provides data. We use two shades of blue for positive returns and two shades of red for negative returns to improve the readability of the search terms. The cumulative performance for the “buy and hold strategy” is also shown, as is a “Dow Jones strategy”, which uses weekly closing prices of the Dow Jones Industrial Average (DJIA) rather than Google Trends data (see gray bars). Figures provided next to the bars indicate the returns of a strategy, R, in standard deviations from the mean return of uncorrelated random investment strategies, <R> RandomStrategy = 0. Dashed lines correspond to −3, −2, −1, 0, +1, +2 and +3 standard deviations of random strategies. We find that returns from the Google Trends strategies tested are significantly higher overall than returns from the random strategies (<R> US = 0.60; t = 8.65, df = 97, p < 0.001, one sample t-test). (B) A parallel analysis shows that extending the range of the search volume analysis to global users reduces the overall return achieved by Google Trends trading strategies on the U.S. market (<R> US = 0.60, <R> Global = 0.43; t = 2.69, df = 97, p < 0.01, two-sided paired t-test). However, returns are still significantly higher than the mean return of random investment strategies (<R> Global = 0.43; t = 6.40, df = 97, p < 0.001, one sample t-test). Full size image

We find that returns from the Google Trends strategies we tested are significantly higher overall than returns from the random strategies (<R> US = 0.60; t = 8.65, df = 97, p < 0.001, one sample t-test).

We compare the performance of these search terms with two benchmark strategies. The ‘buy and hold’ strategy is implemented by buying the index in the beginning and selling it at the end of the hold period. This strategy yields 16% profit, equal to the overall increase in value of the DJIA in the time period from January 2004 until February 2011. We further implement a ‘Dow Jones strategy’ by using changes in p(t) in place of changes in search volume data as the basis of buy and sell decisions. We find that this strategy also yields only 33% profit with Δt = 3 weeks, or when determined as the mean value over the six returns obtained for Δt = 1...6 weeks, 0.45 standard deviations of cumulative returns of uncorrelated random investment strategies (Figs. 3A and 3B; see also Fig. S101 in the Supplementary Information).

Our results show that performance of the Google Trends strategy differs with the search term chosen. We investigate whether these differences in performance can be partially explained using an indicator of the extent to which different terms are of financial relevance—a concept we quantify by calculating the frequency of each search term in the online edition of the Financial Times from August 2004 to June 2011, normalized by the number of Google hits for each search term (see Fig. S2 in the Supplementary Information). We find that the return associated with a given search term is correlated with this indicator of financial relevance (Kendall's tau = 0.275, z = 4.01, N = 98, p < 0.001) using Kendall's tau rank correlation coefficient37.

It is widely recognized that investors prefer to trade on their domestic market, suggesting that search data for U.S. users only, as used in analyses so far, should better capture the information gathering behavior of U.S. stock market participants than data for Google users worldwide. Indeed, we find that strategies based on global search volume data are less successful than strategies based on U.S. search volume data in anticipating movements of the U.S. market (<R> US = 0.60, <R> Global = 0.43; t = 2.69, df = 97, p < 0.01, two-sided paired t-test).

Our empirical results so far are consistent with a two part hypothesis: namely that key increases in the price of the DJIA were preceded by a decrease in search volume for certain financially related terms and conversely, that key decreases in the price of the DJIA were preceded by an increase in search volume for certain financially related terms. However, our trading strategy can be decomposed into two strategy components: one in which a decrease in search volume prompts us to buy (or take a long position) and one in which an increase in search volume prompts us to sell (or take a short position).

In order to verify that both strategy components play a significant role in our results, such that we have evidence for both parts of this hypothesis, we implement and test one strategy in which we take long positions following a decrease in search volume but never take short positions (Fig. 4A) and another strategy in which we take short positions following an increase in search volume but never take long positions (Fig. 4B). We find that returns from both Google Trends strategy components are significantly higher overall than returns from a random investment strategy (long position strategies: <R> USLong = 0.41; t = 11.42, df = 97, p < 0.001, one sample t-test; short position strategies: <R> USShort = 0.19; t = 5.28, df = 97, p < 0.001, one sample t-test).