☆ 4.5 Article

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

NEURAL PROCESSING LETTERS (2011)

期刊

NEURAL PROCESSING LETTERS

卷 33, 期 2, 页码 187-200

出版社

SPRINGER

DOI: 10.1007/s11063-011-9172-2

关键词

Memory-based reinforcement learning; Markov decision processes; Partially observable Markov decision processes; Reinforcement learning

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

期刊

NEURAL PROCESSING LETTERS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

期刊

NEURAL PROCESSING LETTERS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文