Significance It is more important than ever to seek information adaptively. While it is optimal to acquire information based solely on its instrumental benefit, humans also often acquire useless information because of psychological motives, such as curiosity and pleasure of anticipation. Here we show that instrumental and noninstrumental motives are multiplexed in subjective value of information (SVOI) signals in human brains. Subjects’ information seeking in an economic decision-making task was captured by a model of SVOI, which reflects not only information’s instrumental benefit but also utility of anticipation it provides. SVOI was represented in traditional value regions, sharing a common code with more basic reward value. This demonstrates that valuation system combines multiple motives to drive information-seeking behavior.

Abstract Adaptive information seeking is critical for goal-directed behavior. Growing evidence suggests the importance of intrinsic motives such as curiosity or need for novelty, mediated through dopaminergic valuation systems, in driving information-seeking behavior. However, valuing information for its own sake can be highly suboptimal when agents need to evaluate instrumental benefit of information in a forward-looking manner. Here we show that information-seeking behavior in humans is driven by subjective value that is shaped by both instrumental and noninstrumental motives, and that this subjective value of information (SVOI) shares a common neural code with more basic reward value. Specifically, using a task where subjects could purchase information to reduce uncertainty about outcomes of a monetary lottery, we found information purchase decisions could be captured by a computational model of SVOI incorporating utility of anticipation, a form of noninstrumental motive for information seeking, in addition to instrumental benefits. Neurally, trial-by-trial variation in SVOI was correlated with activity in striatum and ventromedial prefrontal cortex. Furthermore, cross-categorical decoding revealed that, within these regions, SVOI and expected utility of lotteries were represented using a common code. These findings provide support for the common currency hypothesis and shed insight on neurocognitive mechanisms underlying information-seeking behavior.

Adaptive information seeking is critical in goal-directed behavior in humans. Collecting too little information, paying too much for information, not discriminating relevant information from irrelevant ones, or acting on unreliable or false information, can all result in failure to achieve desired goals. Understanding neurocognitive mechanisms of adaptive information seeking is not only important in neuroscience, psychology, and economics, but also has wide real-world applications, such as policymaking, public health, and artificial intelligence.

Information-seeking behavior is frequently viewed as reflecting agents’ curiosity, i.e., motive to collect information for its own sake (1⇓–3). This, however, poses a challenge for decision-making models such as reinforcement learning (RL) because information seeking by itself is not directly reinforced by explicit, tangible rewards. To incorporate curiosity-driven information seeking, decision-making models often postulate that information is intrinsically rewarding, and more specifically, exploratory actions are encouraged by some forms of bonus utility (4⇓–6). Various forms of utility bonus have been proposed, such as surprise (7), novelty (8⇓–10), perceived information gap (2), and anticipatory utility from savoring and dread (11⇓⇓–14). At the neural level, dopaminergic reward system may multiplex utility bonus with signals on extrinsic reward (14⇓⇓⇓⇓⇓–20). Multiplexing extrinsic reward signals and utility bonus would help otherwise myopic agents to achieve appropriate balance of exploration (seeking more information) and exploitation (acting on available information).

Relying solely on curiosity, however, can be detrimental to adaptive goal-directed information seeking. Most importantly, motivation to acquire information should be sensitive to instrumental benefits that can be gained from accruing said information. For instance, our interest in weather forecast would likely be greater if we are trying to decide whether to go hiking or read indoors, compared with if we have already decided to stay indoors. Such goal-driven information seeking is particularly challenging when agents need to acquire information that they have never acquired before (e.g., a morning TV show in a foreign country we have never seen), where the bonus utility may not be adaptively formed based on the reward history.

Maximizing the instrumental benefits of information acquisition instead requires forward-looking simulation of agents’ own actions and outcomes under different possible informational states (“I’ll go hiking if it will be sunny, but read indoors if rainy.”) If agents are driven solely by curiosity but do not explicitly evaluate instrumental benefits, they may fail to discern relevant and useful pieces of information from irrelevant ones. At the neural level, aforementioned curiosity-related dopaminergic activity may not be sufficient for maximization of instrumental benefits, and little is known regarding how dopaminergic reward system represents and integrates information’s instrumental benefits and noninstrumental curiosity signals.

The importance to evaluate forward-looking instrumental benefit has long been recognized in economic and ethological studies of decision-making, owing to abundance of information seeking in problems ranging from comparison shopping to job/mate search (21, 22). Normative economic accounts presume that agents acquire information only as a consequence of utility maximization. Specifically, instrumental benefit of information is measured as value of information (VOI), i.e., how much it would improve choices and expected utility (EU); agents acquire the information only if its VOI outweighs its cost. Although normative VOI calculation may be computationally more complex than basic rewards (e.g., food or money), subsequent processes of cost-benefit analysis and action selection can be similar to other types of value-driven choices.

That the motivation to acquire information may be indexed by a single value measure, such as VOI, opens up a number of interesting hypotheses. First, dopaminergic reward system may drive information seeking not only by encoding noninstrumental utility bonus but also instrumental benefits. While it is yet to be established whether normative VOI alone is represented or is multiplexed with noninstrumental motives to constitute subjective value of information (SVOI), the reward system may represent that informational value in the same way as conventional reward signals; for example, monetary reward. Second, individual neurons may encode both informational value and conventional rewards in the same way—the neural common currency hypothesis—which is advantageous for computing trade-offs guiding choice (23, 24). Common currency may particularly provide an elegant solution to the exploration-exploitation dilemma by allowing agents to directly compare action value of respective options (4, 25). Although common currency across reward categories has been observed in humans and monkeys (24, 26⇓–28), it has not been tested with instrumental information value.

To address these questions, we conducted an fMRI study where subjects made choices on costly, but directly actionable, information. Subjects were presented with a lottery with two monetary outcomes (a gain and a loss) and asked to choose whether to accept or reject it. The outcome probability was initially hidden and described as fair, but subjects could purchase the information to reveal the true probability. This information has direct instrumental benefit because subjects could change their choice flexibly based on the revealed probability. For instance, a subject may play a fair lottery with a large gain and a small loss, but reject it if the loss turns out to be more likely. Although there is a chance that the loss probability turns out to be smaller and she retains her original choice, she may purchase the information if the benefit of avoiding the loss is large enough to justify the cost.

We observed that subjects’ information-seeking behavior was indeed largely driven by instrumental benefit. Subjects’ information purchase choice was systematically sensitive to lotteries’ outcomes and possible probabilities, consistently with the normative VOI prediction. We further examined the contribution of additional noninstrumental motives. While we found no evidence for simplistic utility bonus, information-seeking choices were better explained by a SVOI model that involves anticipatory utility in addition to instrumental benefit. Next, using support vector regression (SVR) on voxel-wise BOLD signals, we tested a key prediction of the common currency hypothesis—common code between SVOI and reward values at the level of voxel-wise BOLD signals. We found that SVOI was represented in striatum and ventromedial prefrontal cortex (VMPFC), traditional valuation regions. Lastly, cross-categorical decoding revealed that these representations shared a common coding scheme with more basic values, consistent with the neural common currency hypothesis.

Discussion A substantial portion of our daily actions pertains to information seeking. Particularly in the digital age where a tremendous amount of information is available at our fingertips, acquiring relevant information to an appropriate degree is as important as making use of acquired information. Going back at least to Berlyne (3), psychologists studying functions, causes, and consequences of motivation and interests have hypothesized the relationship between exploratory and information-seeking behavior and reward system. More recently, since Kakade and Dayan’s influential proposal (15), neuroscientists have provided evidence that putative noninstrumental motives are represented in dopaminergic reward system in monkeys (16, 20) and humans (8⇓–10, 14, 17⇓–19), as if they shape subjective value function that favors information seeking. However, because existing studies have largely focused on instances of noninstrumental information seeking, it remains unclear how subjective preference for forward-looking, instrumental information is formed, and to what extent dopaminergic reward system is involved in that process. Behaviorally, our study provides evidence that subjective value of information (SVOI) consists of (at least) two motives: forward-looking instrumental benefit, consistent with normative economic VOI theories, and anticipatory utility, an example of noninstrumental motives. Other models on noninstrumental motives that are independent of reward value of outcomes, such as constant utility bonus (4), were insufficient in explaining the observed behavior. Particularly, consistent with the notion of savoring, we found outcome-dependent over-purchase of information. Our results extend the findings from the past studies on anticipatory utility, which focused mostly on noninstrumental information and did not quantitatively capture concurrent contribution of instrumental and anticipatory value for information (14, 31, 32). That both motives we identified are strongly sensitive to future possible outcomes highlight the involvement of valuation systems in information-seeking behavior in general, which is sometimes overlooked in curiosity literature. The possibility that anticipatory utility is an important component of information seeking opens up several important questions. One particular issue concerns the effect of dread, or utility of anticipating negative outcomes (34). The effect of dread may be large enough for some people to avoid potentially negative information even when its instrumental benefit is critical, such as medical conditions, but more studies are needed to empirically quantify its relative contribution in instrumental information settings. Our study could not measure its effect quite reliably because our subjects could reject unfavorable lotteries. Second, anticipatory utility provides a possible explanation for the phenomenon of ambiguity aversion. Intuitively, the desire for information may be causally linked to aversion to the lack thereof (12). It may thus be not a coincidence that nonlinearity of the aggregator function that determines second-order utility, a critical part of recursive utility theory, is also central to some theories on ambiguity and compound lotteries (35). Future studies may be able to use our experimental paradigm to quantify anticipatory utility at the individual level and correlate with ambiguity attitude. Neurally, if information seeking is driven by subjective value signals in dopaminergic reward system, we should expect such responses to exhibit two features; first, they should be scaled according to subjective preference for information, which would reflect both instrumental and noninstrumental motives, and second, they should be on a common currency with extrinsic reward. Our results that SVOI and EU share the common code in BOLD from striatum and VMPFC are highly consistent with these predictions, because these regions receive massive dopaminergic projection (36) and represent various kinds of values (37, 38), with some evidence for common currency (24, 26⇓–28). In particular, our findings expand existing knowledge by showing that striatum also represents forward-looking instrumental benefits. Our decoding approach is suitable to test common code because it characterizes localized fine-grained representation, while typical brain mapping studies only examine spatially smoothed signals and whole-brain approaches such as elastic net examine representations distributed across the brain. Moreover, our results yield an additional prediction that, when monkeys act on forward-looking instrumental benefit of information, rather than merely receive noninstrumental information (16, 20, 39), it may also be encoded by their midbrain dopamine neurons. We found SVOI representation in other brain regions as well, but with limited evidence for common code, where cross-categorical decoding was observed only in the right MFG. As SVOI computation requires the simulation of agents’ own choices and outcomes under possible informational states, this may reflect the higher need of neurocognitive recourses than basic rewards, particularly working memory and planning. Relatedly, although encoding of noninstrumental information value was reported in orbitofrontal cortex (OFC) in monkeys, it was distinct from reward encoding (39), contrary to midbrain dopaminergic neurons (16). A recent human fMRI study corroborated this distinction, reporting that striatum and midbrain dopaminergic regions represent subjective value of noninstrumental information, which is influenced by possible outcome valence, while OFC merely represents availability of information regardless of valence (14). These suggest that, while OFC may encode signals relevant to information valuation, they seem not to use a common code with other types of values (40, 41). Taken together, information valuation may be supported by multiple neurocognitive processes, and it may converge with other values in striatum and/or VMPFC. Our evidence for voxel-level common code is consistent with the neural common currency hypothesis. However, due to the nature of fMRI, it still leaves open the possibility that distinct neuron populations represent EU and SVOI but are sampled by overlapping voxels. More direct evidence for common currency at the neural level would come from electrophysiological recording while subjects acquire instrumental information. Our findings also raise an important question regarding the “common scale”; i.e., whether neural responses to SVOI and other reward values are scaled to be in the same range, thereby allowing direct comparison between information and rewards (23). To directly test the common scale, it would be ideal to use experimental paradigms in which subjects choose between information and noninformational goods and examine if cross-categorically decoded values predict such choices (27). Such an approach would also bridge the conceptual gap between one-shot information acquisition and exploration-exploitation dilemma, in which agents choose between myopic reward and information. Further investigations are also needed on how humans adopt different strategies on information seeking under various goals, from stable to dynamic environments, and from short to long temporal horizons (1, 25). Although we found little support for utility bonus accounts in our experimental paradigm, it is entirely possible that they are responsible for exploratory behavior in more dynamic settings with longer temporal horizon (4, 42). Moreover, other proposed motives we did not study here, such as novelty or surprise (1, 7⇓⇓–10), might be necessary or more suited to ensure the adequate degree of information seeking in certain circumstances, particularly outside value-based decision-making domains. Our results raise an interesting possibility that such difference in motives may be partly caused by whether reliable SVOI signals from dopaminergic system are available, depending on factors such as the difficulty or cognitive load of model-based SVOI computation (43). Potential motives of information seeking have been long studied separately, and the current study marks an important step, both theoretically and empirically, toward integrative understanding.

Methods All subjects provided informed consent; all protocols were approved by UC Berkeley Committee for the Protection of Human Subjects and Virginia Tech Institutional Review Board. Detailed method descriptions are available in SI Appendix, SI Methods. Task Design. In each trial, a lottery with two outcomes (x 1 , x 2 ) was presented as a roulette wheel, and subjects chose whether to play it assuming no further information (s 0 ). Then the information was presented as a magenta partition on the wheel, which defined the two possible probability distributions, P(x 1 ) = π (s + ) or 1 − π (s − ). π, the information’s diagnosticity, was determined by the orientation of the magenta partition; π = 1, 5/6, or 2/3 when the partition was vertical, slanted by 30°, or slanted by 60°, respectively. The cost of the information was presented after the delay, and subjects chose whether to purchase it. The purchased information was delivered after the scanning. When the information was delivered, one side of the magenta partition was brightened, indicating the posterior probability (s + or s − ), and subjects could change their original lottery choice. Subjects were told that the brighter side would be chosen randomly. Behavioral Modeling. The predictions of VOI and SVOI with anticipatory utility were obtained as the sunk cost for the information at which agents that maximize EU (or expected second-order utility in the case of SVOI model) would be indifferent between informed and uninformed choices. The aggregator function that maps the first-order to second-order utility in SVOI model was estimated by likelihood maximization of information purchase choices. Models were compared by cross-subject cross validation. fMRI Decoding Analysis. Voxel-wise activation from the two epochs in each trial, lottery presentation (for EU decoding), and information presentation (for SVOI decoding), were used as features of leave-one-run-out cross-validation SVR. Within-categorical decoding took a whole-brain searchlight approach. SVOI decoding accuracy was evaluated by Pearson partial correlation between predicted and actual SVOI labels while controlling for π. Accuracy of cross-categorical decoding was evaluated within the ROIs defined by the within-categorical decoding. Null-hypothesis distribution was obtained by permuting labels across trials while maintaining the trial-wise pairing of SVOI and EU.

Acknowledgments We thank Amanda Savarese, Cassandra Carrin, and Duy Phan for assistance with data collection. This research was funded by National Institute of Mental Health Grant MH098023 and Collaborative Research in Computational Neuroscience/National Institute on Drug Abuse Grant DA043196 (to M.H.).

Footnotes Author contributions: K.K. and M.H. designed research; K.K. performed research; K.K. analyzed data; and K.K. and M.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1820145116/-/DCSupplemental.