Geoffrey Schoenbaum Reviewing Editor; National Institute on Drug Abuse, National Institutes of Health, United States In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Orbital frontal cortex updates motivational state-induced changes in value to control decision-making" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Geoffrey Schoenbaum (Reviewer #1), is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

In this study, the authors show that chemo- and opto-genetically targeted inhibition of the OFC impairs the ability of rats to learn the new value of a reward for the purposes of modifying instrumental responding. This so-called "incentive learning" is assessed by inactivating during preexposure to a food reward after an up- or down-shift in general motivational state (hunger). Inactivation of OFC during this preexposure had no effect on motivational changes in consumption but abolished later changes in a pre-trained instrumental response to obtain that food. The results illustrate a role for OFC in learning generally and in the acquisition of model-based information more specifically, including information that is necessary for normal instrumental responding.

Essential revisions:

While the reviewers generally thought the results were novel and represented an impressive set of studies, there were some concerns over the framing and interpretation and also in some of the analyses. The below represent the critical changes, though we will leave the individual reviews as well, which should be addressed as possible.

1) The framing and interpretation of the study was difficult to parse. This issue is expressed by two of the reviews, so we do not belabor it here. Basically the framing/interpretation did not seem to cleanly capture the novel aspects of the research. The reviews provide some suggestions, but of course this is up to the authors. As it stands, I think not making this clear would affect the success of the paper in a broader audience.

2) There was some agreement that moving Figure 1J from supplemental would help the uninitiated understand the significance of the incentive learning phenomenon. This seems like an important control to have represented at the outset of the paper.

3) Procedurally it was felt that collapsing the two strains of mice without providing more detail is problematic. There is a statement that there were "no differences between strains and approaches". This needs to be accompanied with some statistical evidence that there are no main effects or interactions of the factor (strain) with the critical measures. Possibly the two groups could be plotted separately in the supplemental. Depending on the power this might be sufficient to satisfy this concern.

4) There is a similar concern over the mixing of CNO and saline in the control groups. Given recent evidence that CNO is converted to clozapine and thus may have its own effects, it is important to establish clearly that CNO alone is not responsible for the effects reported, since these might be general and not OFC specific. This could be done a variety of ways. We leave it to the authors, but this needs to be addressed.

5) Generally groups were relatively small. This led to some p values being borderline or even non-significant. In one place a p = 0.05 is stated as significant. Is the threshold p<0.05 or equal? More importantly for the optogenetic experiment, the key a priori comparison is not nearly significant. We appreciate the comparisons of each group to baseline, but the comparison across groups really is the key question. In this case, we think this needs to be improved by replicating the study or perhaps removed.

For other points, please see the reviews below.

Reviewer #1:

In an impressive series of experiments, Gremel and colleagues demonstrate that the OFC is required for bidirectional changes in instrumental behavior after motivational changes for the predicted reward. Using chemo and optogenetic approaches they show that inactivating OFC in mice during revaluation of a reward – so-called incentive learning – disrupts later changes in lever pressing for that reward. Overall the behavior is extremely well done and rigorous in its design and analysis and the results seem very clear and important to me.

They are important for several reasons I think. First they clearly show a role for the OFC in goal- or model-based instrumental behavior. While Gremel and Costa showed this previously, it is somewhat at odds with negative data (mostly Ostlund and Balleine), so this is important by itself. Further, the results suggest the possibility that the discrepancy with prior reports could reflect a unique role for the OFC in updating the value of the sensory properties of the reward. I do not know for sure, but it seems to me that depending on procedures and the timing of your manipulations (and assuming no role for executing the instrumental action later for OFC, something not shown here), you might get variable results. Third I think this may be the first evidence that OFC is generally necessary for changes in behavior as opposed to just when things get worse; such deficits are often described as "disinhibition". I like the current result because it cannot be explained like this. Finally, this result demonstrates a clear effect of OFC on learning in the context of an inference-based behavior. This final novel bit adds to prior evidence that OFC is important for learning, but shows that the learning influenced by OFC includes (though may not be limited to) model-based or what I think of as "real" associative learning (as opposed to model-free or cached value structures). This is important to me personally since the learning effects of OFC output seem to influence dopaminergic error signals. Showing that OFC dependent learning contributes to model-based behavior later fits well with recent expansion of the types of learning dopamine errors may support. For all these reasons, I think this is an excellent study.

That is my case for the paper. On the con side, I found I had some trouble understanding fully the authors framing. I basically never felt that the question was clearly defined – indeed I am not sure that the authors question corresponds cleanly to any of the novel aspects of the results highlighted above. The authors seem to want to contrast "updating" or a role in learning versus one in just perceiving or perhaps using value. I think if the authors could sharpen their question – whether it corresponds to any of the answers I highlighted above – it would help me.

To assist, I've outlined a bit of what confused me:

For starters, I think there is good evidence already that the OFC cannot be necessary just for perceiving value. There is a long list of value based behaviors, including sucrose consumption, but also extending to things like discrimination learning, Pavlovian conditioning, and even economic choice, that are not dependent on OFC. So I was confused that a major goal of the paper seemed to be to rule this out….. I imagine the authors mean something much more specific or I misunderstood?

I was also confused by the emphasis on learning. There are a number of reports highlighting a role for OFC in learning. These range from not very well-controlled studies of reversal learning to the demonstration that inhibition of OFC during over-expectation impairs "updating" of the associative meaning of the cues. I see the unique importance of what has been done here (outlined above), but it is not clear to me whether there is something especially unique about incentive learning the authors mean to highlight or whether it can be an example consistent with these other reports.

Finally I feel like the idea of updating is not mutually exclusive with a role for the OFC in later contributing to the deployment of the information. In my opinion, OFC is clearly necessary for deploying information (distinct from any role in learning/updating) in Pavlovian behaviors. It is not clear whether it is also true for instrumental – the authors would be more expert than me in that – but the current expt does not comment on this. I was never clear if the authors meant to cite their data as evidence against any role for OFC in using information? Compounding my confusion, there are a couple studies that have tried to get at the question of OFC and updating, but they were not clearly discussed. So I am not sure how the authors think their current data add to or perhaps contradict or are different from these approaches.

Importantly these are mostly framing and interpretive issues. The experiments are beautiful, and I have no major arguments over the data really. I generally think authors have a right to discuss their data however they think best. So I guess my comments are mostly meant in that spirit. Here are a few citations that came to mind, supporting the above comments:

West et al., 2011. "Transient inactivation of orbitofrontal cortex blocks reinforcer devaluation in macaques." Journal of Neuroscience 31: 15128-15135. shows in mks that inactivation disrupts devaluation effects during selective satiation. also applies after though…. different from amygdala

Murray et al., 2015. "Specialized areas for value updating and goal selection in the primate orbitofrontal cortex." eLife 10.7554/eLife.11695.001. suggest a dissocation with area 13 mediating value updating in satiation and area 11 seemingly the use of the information later

Gardner et al., 2017. "Lateral orbitofrontal inactivation dissociates devaluation-sensitive behavior and economic choice." Neuron 96: 1192-1203. shows pretty clearly that ofc is not generally necessary for perceiving value…..

Takahashi et al., 2013. "Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning." Neuron 80: 507-518. example of updating that is impaired by temporally specific optogenetic inactivation of the ofc

Reviewer #2:

This study adapts an incentive learning paradigm for mice in order to study the role of OFC in motivated responding. The paradigm uses different satiety states to change the value of a sucrose solution. The authors found that chemogenetic or optogenetic down-regulation of OFC projection neurons resulted in a selective impairment in incentive learning, as measured by the mouse's ability to update values that inform future behavior, when the update is based on sucrose exposure in a new motivational state.

Overall I think this study is very well conceived and executed, and the results are important. They build on earlier work showing that rodent OFC is required to update values by adding specificity in two domains: (1) they demonstrate learning-related effects of OFC manipulations with no effects on reward perception or palatability, and (2) they demonstrate a specific temporal requirement for OFC involvement during ongoing behavior. The manuscript is clearly written and overall I have no major concerns.

[Minor comments not shown.]

Reviewer #3:

This paper by Baltz and colleagues describes the application of an incentive learning procedure to mice, and reports the effects of DREAD and optogenetic inactivation of the OFC during the incentive learning period to test whether it is necessary for the revaluation process. The authors find positive evidence for an OFC contribution during food intake after a motivational shift, as seen one day later in weak modulation of responding for that reinforce in an extinction test. The question of the contributions of OFC to incentive learning is an important one, and relates to overall hypotheses on its role in economic choice, value encoding, and state space construction. There is much to like in this paper from the characterization of the behavioral model to using two different approaches to decrease activity in the OFC. I especially like the use of optogenetic activation of inhibitory interneurons to inhibit OFC and to circumvent issues related to inhibitory opsins. This was clearly a lot of work and there are some interesting findings here. However, I am not clear on the specific advance relative to other papers showing OFC is required during the reinforcer exposure for devaluation via sensory specific satiety (i.e., Murrey et al., eLife 2015). In addition, when considering experimental design, the number of mice in many groups are small, which likely contributes to some of the statistical "trends" and inconsistencies across experiments. Additional specific points below.

1) The first sentence of the Discussion states "here we uncover a cortical area that controls experience-based value updating". Can the authors please expand on how the current findings move beyond prior knowledge? For example, the Introduction does not spell out what the authors might consider are the critical distinctions between devaluation and incentive learning and the OFC's contribution to each. The discussion is equally confusing in this regard. In a related point regarding placing this work within the framework of prior studies, can the authors mention in the discussion how they see the current findings relating to the demonstrated role of the OFC in using information on changed value at test in multiple settings?

2) Many of the groups are n's of 5-7. This is quite low for mouse behavior, especially given the inter-individual variation seen in the acquisition data provided in the supplement. To firmly establish a new variant of a behavioral model, it would be ideal to run sufficient control subjects to get a firm idea of the typical behavior for both positive and negative incentive learning. What is responding during training and during test for a larger group.

3) In describing the analysis of licking behavior during incentive learning the authors analyze all licks, and then break out just those during actual consumption. Were there any effects on the "anticipatory" licks? Also in all of the different experiments, it is not clear if the amount of sucrose consumed across conditions differed. For example, did mice in the negative incentive contrast experiment drink less sucrose after a reduction in hunger than their controls?

4) There are problems with the statistics. The authors report p values of.09 and.07 as trends and a value of.05 as significant. To my mind the comparison is significant or it is not, and to be significant it should be less than 0.05. In addition, the p-vlaues just mentioned appear to be different than in the corresponding Figure 1, I and J. Later when p-values appear that are between.1 and.05 the authors do appropriately consider them non-significant. In general, if the authors suspect there may be a group difference that they are not detecting, I would suggest it may be due to their relatively low sample sizes.

5) In a related point, in some experiments, the relevant control groups fail to show the predicted effect, as seen in Control 2-16 mice wherein the increase from baseline responding was not significant for the initial DREADD experiment, and for the final optogenetics experiment in which there was not a significant group difference. Thus the authors cannot draw strong conclusions in either of these experiments.

6) There are multiple aspects of the data that should be shown that currently are not. For example, the main test data are shown as percentage of baseline responding. While this is appropriate, the actual responding should also be made available. Second, in multiple cases the authors combine mice across groups, for example, combing two kinds of transgenic mice, or combining both saline and CNO treatments in control groups. While this may also be fine, if supported statistically, a table showing the actual group data, and the number of cases in each group, for each group is required. The combination of mouse lines is potentially inappropriate given their different backgrounds (B6/129 vs B6) as these lines typically have very different instrumental behavior. In addition, given the current controversies over off-target actions of CNO, these data must be shown independently and ideally compared with saline.

7) Because the authors conducted their OFC manipulations in the same context as the later test, they cannot claim at this point that OFC inactivation is required for instances of incentive learning that occur outside that context. While this is very likely the case, the authors did not show that here.

8) Finally could the authors clarify their text in the discussion on prediction error and the paragraph on retrieval of sucrose representations. The logic is not clear/I find many of the sentences a bit inscrutable.