This article is part of our special issue “Nudge Turns 10,” which explores the intersection of behavioral science and public policy. View the complete issue here .

In 2013, the influence of Nudge and the success of the groups like the Behavioural Insights Team (BIT) demonstrated that behavioral science was a powerful tool in the developed world. It led us at the World Bank to wonder, what impact could behavioral science have when applied across dozens of developing countries with different governments, capacities, and needs?

Inspired by the applied behavioral science work at the time, we devoted our 2015 World Development Report to the issue and formed an internal team—the Mind, Behavior, and Development Unit, or “eMBeD” —to operationalize behavioral science across the bank. Since then, we’ve worked in 65 countries on over 85 projects. While we’re still learning a lot as we go, we want to share a few lessons for anyone aiming to integrate behavioral science in complex development settings.

Lesson #1: Quest, test, and be prepared for the unexpected.

Even tried and true behavioral nudges that may have worked in high-income contexts aren’t always effective in developing countries. For example, social norms have long been held as a gold-standard way to increase tax revenue, as the Behavioural Insights Team and others have demonstrated . But in Poland, a middle-income country, a World Bank trial reaching 150,000 individuals found that using punitive language increased tax compliance more than did peer comparisons— “hard tones” increased tax compliance by 20.8 percent . If the best- performing communication had been sent to all taxpayers covered by the trial, the Polish Tax Authority would have generated 56 percent more in revenues. This shows how even robust behavioral effects might not translate to different contexts.

Even tried and true behavioral nudges that may have worked in high-income contexts aren’t always effective in developing countries.

Sometimes, the lack of replication in nearly identical regional contexts is more complex. In Nigeria, for instance, eMBeD worked on a project to address inaccurate and incomplete health-care record keeping, which limits policy makers’ ability to direct funds where they are needed. Providing appropriate financial or social incentives to public sector workers is crucial in Nigeria, where 38 percent of public sector projects never start, and only 31 percent finish. As part of a larger World Bank-sponsored Public Expenditure Tracking Survey, which recorded information on resource flows in real time, we tested two behavioral incentives—social recognition and lottery tickets.

In a pilot intervention in Ekiti state in western Nigeria, eMBeD found that incentivizing accurate administrative work through social recognition programs and ceremonies increased recordkeeping accuracy by 13 percent . But, the social recognition intervention made no difference in Niger; similarly, the lottery incentive showed no impact in Niger and was inconclusive in Ekiti. Why the discrepancy in impact?

Ekiti outperforms Niger in several social indicators, including the quality of vital statistics, live births attended by skilled personnel, adult literacy, and immunization rates. Health personnel are also better educated. The different results between the states may mean that the social-recognition incentive requires higher levels of training and organization by health officials in order to be effective—there is only so much a behavioral intervention can do without adequate training. These are some of the contextual issues that can make a difference between effective and ineffective interventions. Ultimately, for anyone testing interventions in settings with varied economic, social, and administrative contexts, RCTs can reveal nuances that you might not have anticipated, and they can help inform scaling and implementation for greater effectiveness.

Lesson #2: Deploying interventions in different countries means thinking critically about scale.

In Peru, we partnered with the Ministry of Education on an intervention designed to help middle-school students reframe their beliefs about effort and success through a 90-minute lesson on growth mindset. The intervention led to a 0.14 standard deviation increase in math test scores, equivalent to four extra months of schooling, at a cost of less than $0.20 per student. eMBeD reached 50,000 students in an initial phase and an additional 250,000 subsequently. With such an exciting result, people are eager to scale it within the country and bring it to others. However, we know we need to look deeper at the implications for expansion. How does this intervention work when you test it on students in Indonesia or South Africa? Can you adapt it to be an app or computer-based intervention, or in an after-school program instead? For which populations does it dramatically improve outcomes, and for whom does it barely move the needle?

In many cases, the policy makers and beneficiaries we’re trying to serve have limited resources and bandwidth, so it’s essential to target and scale these interventions efficiently. Human behavior may not differ so widely from individual to individual, but as our Nigeria example demonstrates, complex contexts require specificity and understanding, a challenge that forces us to think outside the box in terms of available tools, delivery methods, and more.

Lesson #3: Look upstream, to policymakers and practitioners themselves, to address behavioral bottlenecks early.

In one experiment, we asked 600 public school teachers in Lima, Peru , to evaluate the scholastic aptitude, behavior, and education potential of a fictional student named Diego. The experiment explored whether socioeconomic markers unconsciously biased these teachers. Some teachers saw a video in which Diego is first shown walking around a middle-class neighborhood; other teachers saw Diego walking through a neighborhood which is markedly low income. Then, teachers in both groups were presented with one of two variants of a video in which Diego takes an exam. In one, Diego’s performance is ambiguous; he answers some questions incorrectly and is sometimes distracted. In the second version, Diego correctly answers most questions and behaves like a model student.

Regardless of which video they saw, teachers have lower expectations for Diego’s final educational attainment when they are primed to think that he is poor. In the ambiguous variant, teachers’ expectation of Diego continuing past high school decreased from 60 percent to 40 percent when they were primed to think that he was poor. Even when “poor” Diego unambiguously performs well, teachers are significantly harsher evaluating his behavioral scores.

It’s possible to embrace complexity when dealing with myriad contexts and settings—complexity of implementation, complexity of people, and complexity of results.

That we’re all susceptible to bias isn’t news—an earlier eMBeD trial with World Bank and DFID staff found similar results around staff susceptibility to confirmation bias, sunk costs bias, and gain- and loss-framing. The Diego study has reinforced for us that the power of behavioral science doesn’t only lie in its ability to improve program efficiency and uptake. By going upstream—to the policy makers and practitioners implementing these programs—we can have enormous impact that reverberates down the beneficiary chain.

Perhaps the most critical takeaway from our experience is that it’s possible to embrace complexity when dealing with myriad contexts and settings—complexity of implementation, complexity of people, and complexity of results. Complexity, we’ve found, is a tool for progress, for looking beyond what’s been done before to what can have surprising impacts in unexpected places.