Multi-timescale reinforcement learning in the brain

Masset, Paul; Tano, Pablo; Kim, HyungGoo R.; Malik, Athar N.; Pouget, Alexandre; Uchida, Naoshige

Multi-timescale reinforcement learning in the brain

Paul Masset (), Pablo Tano, HyungGoo R. Kim, Athar N. Malik, Alexandre Pouget () and Naoshige Uchida ()
Additional contact information
Paul Masset: Harvard University
Pablo Tano: Université de Genève
HyungGoo R. Kim: Harvard University
Athar N. Malik: Harvard University
Alexandre Pouget: Université de Genève
Naoshige Uchida: Harvard University

Nature, 2025, vol. 642, issue 8068, 682-690

Abstract: Abstract To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behaviour can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2–5 and at characterizing the firing of dopaminergic neurons in the midbrain6–8. In classical reinforcement learning, agents discount future rewards exponentially according to a single timescale, known as the discount factor. Here we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopaminergic neurons in mice performing two behavioural tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks, suggesting that it is a cell-specific property. Together, our results provide a new paradigm for understanding functional heterogeneity in dopaminergic neurons and a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations9–12, and open new avenues for the design of more-efficient reinforcement learning algorithms.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41586-025-08929-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:642:y:2025:i:8068:d:10.1038_s41586-025-08929-9

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-025-08929-9

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().