☆ 4.7 Article

Sampling diversity driven exploration with state difference guidance

EXPERT SYSTEMS WITH APPLICATIONS (2022)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 203, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2022.117418

Keywords

Reinforcement learning; Exploration; Intrinsic rewards; Off-policy; Actor-critic algorithm

Funding

Natural Science Research Foundation of Jilin Province of China [20180101053JC]
Na-tional Key R&D Program of China [2017YFB1003103]
National Natural Science Foundation of China [61300049]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a novel exploration method for deep reinforcement learning that can effectively handle environments with sparse or deceptive rewards. The double-actors-double-critics framework combines intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of rewards in previous methods.

Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the doubleactors-double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off policy learning algorithms based on the actor-critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors-double-critics framework through comparative experiments.

Sampling diversity driven exploration with state difference guidance

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Sampling diversity driven exploration with state difference guidance

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper