Google’s DeepMind research group has a lot of flashy AI accomplishments to its name.

Among the biggest are AlphaGo, its game-playing algorithm that stunned many by beating top Go players in a 2016 match; and its successor AlphaZero, which plays Go even better and can also be taught other games; and AlphaStar, which plays the real-time strategy game StarCraft at an expert level.

But DeepMind, of course, wants its AI to be used for more than game-playing eventually. Alongside these highly publicized accomplishments, they’ve been working on a lot more. Two research papers published in Nature on Wednesday highlight some of that work: a paper on how reinforcement learning can teach us more about how the brain works, and a paper on how AI can be used to predict the folding of proteins.

As AI matures as a field (and runs out of video games to conquer) probably more of its achievements will look like these: solid improvements in important research domains.

The brain does reinforcement learning sort of like top-performing AI algorithms

The first paper explores how recent advances in “reinforcement learning” (more on this below) might teach us something about how the brain works. Based on which kinds of reinforcement learning algorithms perform best, researchers theorized that neurons might pass on more complicated information about future rewards than we previously imagined. A study in mice suggests there might be something to that.

For a long time, researchers have argued that similarities exist between the way deep learning AI systems process information and the way the human brain works. One way you can train an AI system is with reinforcement learning, where an AI agent, as it takes actions in the world, finds that some of them are “rewarded,” and over time adjusts its behavior to maximize the reward it can earn.

The idea was quickly applied to neuroscience, with some researchers theorizing that neurotransmitters like dopamine function as reward signals much like the ones in AI systems.

But lots of the things that we might want to “reward” in a human — or, for that matter, a sophisticated AI agent — are things that will happen in the distant future. If someone wants ice cream, for example, and there isn’t any in their house, they have to be motivated to put on their shoes, go to the store, buy the ice cream, and return home before they can eat it. How does that work? How does the brain predict the distant reward of ice cream and reward the steps that lead to it?

We now have pretty good models of how the brain might do that. At each step, it simply has to predict how much reward it will predict at the next step (plus any reward for getting to the next step). That will motivate it toward steps that increase its predicted reward. This simple motivation system might be what underlies complex human actions like (for example) studying for a test — because you want good grades because you want a good job because you want a lot of money.

But what the new paper in Nature argues is that sophisticated AI systems like the ones in use today are actually doing something slightly more complicated than what’s presented above. At each step, they don’t just calculate the average expected reward of their course of action. Instead, they keep a sophisticated probability distribution in their heads.

Systems that do that — called distributional temporal distance learners — score better than systems that only calculate an average on tasks like platformer games (where the player moves around a level to score points).

That got the researchers wondering if humans keep such sophisticated mathematical models in our heads too.

“For the last three decades our best models of reinforcement learning in AI and neuroscience have focused almost entirely on learning to predict the average future reward. But this doesn’t reflect real life — when playing the lottery, for example, people expect to either win big, or win nothing — no one is thinking about getting the average outcome.” Will Dabney, a DeepMind research scientist who contributed to the paper, wrote.

So DeepMind looked at evidence for this in a Harvard study on mice. They set up an environment for the mice that was highly uncertain — with some chance of a big reward and some chance of a very small reward. They measured neuron activity. They found patterns that suggested that, like the AI systems, the mice’s neurons were encoding a complicated probability distribution instead of just an expected average result.

“The brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel,” the paper argues.

That’s pretty cool, and it’s exciting to see our work in AI teach us more about how our own minds work.

Why is it a big deal that AI can fold proteins?

A second Nature paper summarizes DeepMind’s work on using AI to predict protein folding, a crucial issue for developing new drugs. Last year, DeepMind produced state-of-the-art results at the task, and in the paper they describe how those results were achieved.

Let’s say you have a sequence of amino acids — the building blocks of proteins — written down on a piece of paper. When built in a cell, they’ll cinch and clump together, making a distinctively shaped globule. Can we predict that shape?

The answer is that we usually can’t. Interactions between each of the amino acids can produce surprising shapes, and there are few hard-and-fast rules we can rely on to predict how a protein will look. And the number of possible configurations is astronomical: Running through all of the shapes a simple, average-sized protein could take would take longer than the lifetime of the universe, even if you ran through billions per second.

Of course, this hasn’t stopped scientists from trying. The shape that proteins take predicts which other substances they’ll interact with, so understanding protein folding is crucial for drug discovery and could even be used for developing new manufacturing processes.

Every year, at a conference called the Critical Assessment of Structure Prediction (CASP), researchers from around the world submit programs that take a shot at estimating protein structure. Last year, DeepMind was the winner.

The Nature paper explains their winning approach and explores the potential for advanced AI systems to make more progress on problems like this one. “AlphaFold represents a considerable advance in protein-structure prediction,” DeepMind researchers write in the paper. “We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined.”

There’s still a long way to go. AlphaFold was the strongest performer at CASP, but its absolute success rate wasn’t very good, and there are lots of important protein-folding challenges that remain unsolved. But 2019 was DeepMind’s first year submitting an AI solution to the challenge; hopefully, with additional research, AlphaFold and other similar approaches will make even more progress.

Sign up for the Future Perfect newsletter. Twice a week, you’ll get a roundup of ideas and solutions for tackling our biggest challenges: improving public health, decreasing human and animal suffering, easing catastrophic risks, and — to put it simply — getting better at doing good.

Future Perfect is funded in part by individual contributions, grants, and sponsorships. Learn more here.