Massive logs on online human activity create new possibilities to study complex socio-economic phenomena1,2,3. Among these, the dynamics of knowledge exchange networks and in particular the emergent interactions between producers and consumers of information4, have not been explored like the flows of material goods. Yet they have a critical impact on our opinions, decisions and lives5.

An overwhelming amount of information stimuli compete for our cognitive resources, giving rise to the economy of attention6, first theorized by Simon7. At the aggregate level, this phenomenon is often referred to as collective attention. Work on collective attention has mainly focused on the consumption of information8,9,10. Characteristic signatures of information consumption have been shown to correlate with real-world events, such as the spread of influenza11, financial stock returns2, scientific performance12 and box office results13.

The production side of the equation — whether and how the creation of information is driven by demand — has been explored to a limited extent in the literature, owing in part to the challenges in quantifying information demand. Imitation of popular content14, for instance, is the simplest form of supply matching demand for information. However, while examples of imitation of online contents abound15, they do not point to a quantitative relationship between the demand for and production of information. In looking at the role of attention as a possible driver for the generation of novel content, Huberman et al. found a positive correlation between the productivity of YouTube contributors and the number of views of their previous videos16. This confirms that prestige is a powerful motivation for creation of knowledge17.

Here we tackle the measurement of demand and supply of information goods and their relative ordering in time. Looking at attention toward a specific piece of information, no link between traffic bursts and the number of edits to a Wikipedia article has been found so far18. We focus on the creation of Wikipedia articles as a better proxy for the production of information and on visits to topically related articles as a proxy for its demand. Analysis of Wikipedia traffic data thus allows us to study how the generation of new knowledge about a topic precedes or follows its demand.

More specifically, we are interested in how attention toward topics changes around the time that new knowledge about them is created. Moreover, we want to do so by comparing a broad range of topics. Sudden changes of attention, or “bursts”, have been traditionally studied using the logarithmic derivative ΔN t /N t , where N t is the number of visits or links accrued by a topic (e.g. a Wikipedia page, a YouTube video, etc.) during a fixed sampling interval t and the numerator is customarily defined as ΔN t = N t +1 − N t 18,19,20. However, the distribution of ΔN/N is known to be broad, with a heavy-tail decay that follows a power-law distribution19. This lack of a characteristic scale thus makes it difficult to use ΔN/N for comparing diverse topics. Here we propose to use a different measure of traffic change based on a simple normalization of the traffic, in a way that takes into account this and other confounding factors, such as traffic seasonality and circadian rhythms of activity21,22.

Wikipedia is currently the fifth most visited Internet website23 and includes 30 million articles in 287 languages. The English version alone consists of roughly 4.4 million articles and is consulted, on average, by about 300 million people every day. Each entry, or article, of Wikipedia corresponds to a separate web page. Wikipedia can thus be regarded as a large information network, where one can identify broad macroscopic topics. By way of example, Fig. 1 depicts the traffic to two high-profile articles and to their neighbors. The two articles are selected from the 2012 Google Zeitgeist24. We define a topic as such a page, together with all of its neighbors — articles linked by it or linking to it, subsequently to its creation (see Methods). The networks formed by the two topics are shown in Fig. 1(b, d).

Figure 1 Synchronous traffic bursts associate to increased creation frequency in two high-profile topics. (a), Time series of traffic. The grey lines represent the daily traffic to articles that are linked from/to the article “2012 Summer Olympics,” according to a recent snapshot of Wikipedia (see Methods). For visualization purposes, only a random sample of 100 neighbors is shown. The focal page is represented by the black solid line; red and gold lines represent the average and median traffic, respectively. The vertical black segments represent the times when new linked articles are created (see Methods). (b), Network of neighbors of “2012 Summer Olympics.” White nodes represent the neighbor articles predating 2012; colored nodes correspond to neighbors created in 2012. The size of the nodes is proportional to their yearly traffic volume; their position was computed using the ARF layout32. (c and d), Same visualizations as (a) and (b) for the entry about Hurricane Sandy and its neighbors. New articles tend to be peripheral to these networks. Full size image

The volume of traffic to a page or a topic is measured by daily browser requests for the corresponding pages. Weekly fluctuations are evident in the traffic patterns shown in Fig. 1(a, c). It is also possible to observe synchronous bursts of activity, corresponding to increased attention toward the topic. For the Olympics topic, such increase of attention takes the form of an anticipatory buildup, leading to two peaks around the opening and closing ceremonies, followed by a relaxation at a lower baseline. For Hurricane Sandy a sudden spike occurs at the time of creation of the main article, due to the demand of information about the effects of the hurricane.

Phenomena like these have been already observed in a wide range of information-rich environments1,14,19,20,25. During the period of increased attention we see a previously unobserved phenomenon, namely that new articles about the Olympic Games are created at a higher frequency. A weaker pattern is observed for Hurricane Sandy too. To quantify the temporal relation between demand and production of information about a topic, we performed a systematic study over a large sample of articles. An increase of attention toward the topic of an article is revealed by an increase in requests for pages in that topic compared to other topics.

Let us consider a newly created article. A burst of attention for pages related to it occurring before its creation is consistent with a model in which demand drives the supply of information. Conversely, a burst that follows its creation suggests that demand follows supply. On the other hand, if traffic bursts concomitant with the creation of new articles are no different than those observed at any other time, then we shall conclude that production and consumption of information are two unrelated processes.

As the focus of public attention is constantly shifting, we also explore how long is the timespan during which demand and supply for new information are effectively associated to each other. In other words, is there an ideal period during which newly created information will have better chances of receiving more traffic relative to its baseline? We address this question by measuring the time lag between the most recent peak of traffic toward the pages in the topic of a new article and its time of creation. We relate this lag to the traffic received by the new article.