Rewards don’t just reinforce a specific action—they quickly change the whole pattern of how we behave.
Imagine you’re teaching a dog to play fetch. You throw a ball, and your dog sprints after it, picks it up, and runs back. You then reward your panting pup with a treat. But now comes the real trick for your dog: figuring out which part of that sequence earned the treat. Scientists call this the ‘credit assignment problem’ in the brain. It’s a fundamental question about understanding which actions are responsible for the positive outcomes we experience.
Dopamine, a key chemical messenger in the brain, is known to play a crucial role in this process. But exactly how the brain links specific actions to dopamine’s release has remained unclear.
New Insights From a Comprehensive Study
A study published on December 13 in Nature by scientists at the Allen Institute, Columbia University’s Zuckerman Mind Brain Behavior Institute, the Champalimaud Centre for the Unknown, and Seattle Children’s Research Institute sheds new light on this mystery. It reveals how dopamine not only signals a reward but also guides animals to home in on the specific behaviors that lead to these rewards through trial and error.
Intriguingly, the research also shows that the brain’s reward system can swiftly and dynamically alter the full range of an animal’s movements and behaviors. This highlights a sophisticated learning strategy where behaviors are not just reinforced, but actively shaped and fine-tuned through experience, said Rui Costa, D.V.M, Ph.D., the study’s senior author.
“When you reinforce behavior, we often think it’s just that action,” said Costa, the president and CEO of the Allen Institute. “But no: you’re changing the entire behavioral structure. And what was really surprising was how rapid it was.”
Decoding How Dopamine Shapes Learning
To uncover those insights, the team collaborated with engineers and neuroscientists at the Champalimaud Centre for the Unknown to develop a novel “closed loop” system that could link specific actions by mice to real-time dopamine release. The researchers outfitted mice with wireless sensors to track their movements within a simple controlled space. They then fed this data into a machine learning algorithm, which categorized these actions into distinct groups. The researchers then used optogenetics, a method for controlling neurons with light, to stimulate dopamine neurons once the mice performed predefined “target actions.”
They found that mice swiftly changed their behavior in response to dopamine release. Initially, they not only increased the frequency of the target action, but also of similar actions and those that occurred a few seconds before the dopamine release. Meanwhile, actions dissimilar to the target rapidly decreased. Over time, this refinement became more precise, with the mice increasingly focusing on the exact action that led to dopamine release.
The study also examined how mice learn a series of actions, unveiling a key process similar to rewinding time to understand what leads to a reward. When actions triggering dopamine occurred further apart, the mice learned more slowly. This shows that longer waits between actions make it harder for mice to connect the sequence with the reward. In essence, actions right before the reward are quickly grasped and improved upon, while earlier actions are refined more gradually. This ‘rewinding’ process strengthens the behavior and helps the mice progressively identify which precise actions and sequences yield the reward.
Broader Implications for Education and AI
The findings could impact diverse fields like education and artificial intelligence (AI), said lead author Jonathan Tang, Ph.D. , an assistant professor at University of Washington Medicine – Pediatrics, Seattle Children’s Research Institute. For example, allowing for exploration, mistakes, and gradual refinement in the classroom may be more in line with our brain’s innate learning processes.
In AI, the insights could lead to more sophisticated and efficient learning systems. By better replicating biological learning processes, we could create AI that is better at adapting to new data and situations.
This study offers deeper insight into how our brains learn and adapt through trial and error—whether you’re a scientist or a pup.
“We take a lot of stuff for granted about how things work, including credit assignment,” said Tang, who started the research with Costa while at Columbia University. “But it’s when you really start diving in that you realize the complexity. This is why people do science: to home in on the truth of the matter.”
Reference: “Dynamic behaviour restructuring mediates dopamine-dependent credit assignment” by Jonathan C. Y. Tang, Vitor Paixao, Filipe Carvalho, Artur Silva, Andreas Klaus, Joaquim Alves da Silva and Rui M. Costa, 13 December 2023, Nature.