4.7 Article

Sampling diversity driven exploration with state difference guidance

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 203, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.117418

Keywords

Reinforcement learning; Exploration; Intrinsic rewards; Off-policy; Actor-critic algorithm

Funding

  1. Natural Science Research Foundation of Jilin Province of China [20180101053JC]
  2. Na-tional Key R&D Program of China [2017YFB1003103]
  3. National Natural Science Foundation of China [61300049]

Ask authors/readers for more resources

This paper proposes a novel exploration method for deep reinforcement learning that can effectively handle environments with sparse or deceptive rewards. The double-actors-double-critics framework combines intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of rewards in previous methods.
Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the doubleactors-double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off policy learning algorithms based on the actor-critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors-double-critics framework through comparative experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available