4.7 Article

Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity

期刊

NEURAL NETWORKS
卷 126, 期 -, 页码 95-107

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2020.03.005

关键词

Prediction error; Salience; Temporal-difference learning model; Pearce-Hall model; Habit; Striatum

资金

  1. Institute for Information & Communications Technology Promotion (IITP) - Korea government [2017-0-00451]
  2. National Research Foundation of Korea (NRF) - Korea government (MSIT) [NRF-2019M3E5D2A01066267]
  3. Institute of Information & Communications Technology Planning & Evaluation (IITP) - Korea government (MSIT) [2019-0-01371]
  4. Samsung Research Funding Center of Samsung Electronics [SRFC-TC1603-06]
  5. National Research Foundation of Korea [2019M3E5D2A01066267] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially important stimuli, such as reward-predicting or novel stimuli, and allocate resources to these stimuli by modulating how an animal approaches, exploits, explores, and attends. The current study examined the theoretical possibility that dopamine activity reflects the dynamic allocation of resources for learning. Dopamine activity may transition between two patterns: (1) phasic responses to cues and rewards, and (2) ramping activity arising as the agent approaches the reward. Phasic excitation has been explained by prediction errors generated by experimentally inserted cues. However, when and why dopamine activity transitions between the two patterns remain unknown. By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared them with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent focuses its resources on a small number of reward-predicting stimuli, thus leading to task dimensionality reduction. The opposite occurs when the agent re-distributes its resources to adapt to environmental changes, resulting in task dimensionality expansion. This research elucidates the role of dopamine in a broader context, providing a potential explanation for the diverse repertoire of dopamine activity that cannot be explained solely by prediction error. (c) 2020 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据