☆ 4.4 Article

Finding structure in multi-armed bandits

COGNITIVE PSYCHOLOGY (2020)

Journal

COGNITIVE PSYCHOLOGY

Volume 119, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.cogpsych.2019.101261

Keywords

Learning; Decision making; Reinforcement learning; Function learning; Exploration-exploitation; Learning-to-learn; Generalization; Gaussian process; Structure learning; Latent structure

Categories

Psychology Psychology, Experimental

Funding

Office of Naval Research [N000141712984]
Harvard Data Science Initiative
U.S. Department of Defense (DOD) [N000141712984] Funding Source: U.S. Department of Defense (DOD)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, which require participants to trade off exploration and exploitation. Standard multi-armed bandits assume that each option has an independent reward distribution. However, learning about options independently is unrealistic, since in the real world options often share an underlying structure. We study a class of structured bandit tasks, which we use to probe how generalization guides exploration. In a structured multi-armed bandit, options have a correlation structure dictated by a latent function. We focus on bandits in which rewards are linear functions of an option's spatial position. Across 5 experiments, we find evidence that participants utilize functional structure to guide their exploration, and also exhibit a learning-to-learn effect across rounds, becoming progressively faster at identifying the latent function. Our experiments rule out several heuristic explanations and show that the same findings obtain with non-linear functions. Comparing several models of learning and decision making, we find that the best model of human behavior in our tasks combines three computational mechanisms: (1) function learning, (2) clustering of reward distributions across rounds, and (3) uncertainty-guided exploration. Our results suggest that human reinforcement learning can utilize latent structure in sophisticated ways to improve efficiency.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Anesthesiology

The exploration-exploitation dilemma in pain: an experimental investigation

Angelos-Miltiadis Krypotos, Geert Crombez, Maryna Alves, Nathalie Claes, Johan W. S. Vlaeyen

Summary: This study investigates how individuals solve the exploration-exploitation dilemma when facing pain and finds that participants tend to choose the safest option, prioritize rewards over pain, and are more inclined to explore after experiencing pain.

PAIN (2022)

Add to Collection

Article Robotics

LanCon-Learn: Learning With Language to Enable Generalization in Multi-Task Manipulation

Andrew Silva, Nina Moorman, William Silva, Zulfiqar Zaidi, Nakul Gopalan, Matthew Gombolay

Summary: Researchers have developed a language-conditioned multi-task learning method called LanCon-Learn, which helps robots understand the relationship between tasks and objectives for better application in manipulation domains. Experimental results show that LanCon-Learn achieves significant improvement in task success rate and skill transfer compared to non-language baselines.

IEEE ROBOTICS AND AUTOMATION LETTERS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Balancing exploration and exploitation in episodic reinforcement learning

Qihang Chen, Qiwei Zhang, Yunlong Liu

Summary: One of the major challenges in reinforcement learning is the sparse and delayed rewards in episodic tasks. The existing techniques have difficulties in assigning credits to explored transitions or are misled by behavioral policies, leading to sluggish learning efficiency. To address this, we propose an approach called EMR, which combines intrinsic rewards of exploration mechanisms with reward redistribution to balance exploration and exploitation in such tasks.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Review Computer Science, Artificial Intelligence

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks

Anthony Triche, Anthony S. Maida, Ashok Kumar

Summary: Recent works have connected Hebbian plasticity with reinforcement learning, resulting in a class of trial-and-error learning called neo-Hebbian plasticity. Inspired by the role of dopamine in synaptic modification, neo-Hebbian RL methods selectively reinforce associations to enable learning exploitative behaviors. This review focuses on the exploration-exploitation balance under the neo-Hebbian RL framework and suggests potential improvements through stronger incorporation of intrinsic motivators.

NEURAL NETWORKS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

A stable data-augmented reinforcement learning method with ensemble exploration and exploitation

Guoyu Zuo, Zhipeng Tian, Gao Huang

Summary: Learning from visual observations is a challenging problem in RL, with representation learning and task learning to solve. Existing methods using data augmentation can improve RL generation but often cause instability and divergence. We propose DAR-EEE, a unified method that incorporates bootstrap ensembles, to stabilize and accelerate task learning. Our experimental evaluation demonstrates improved sample efficiency and state-of-the art performance on difficult image-based control tasks.

APPLIED INTELLIGENCE (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization

Min Li, Tianyi Huang, William Zhu

Summary: This research proposes an adaptive exploration policy to address the exploration-exploitation tradeoff by adjusting the exploration noise based on training stability. The effectiveness of this policy is demonstrated through theoretical analysis and experiments.

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS (2021)

Add to Collection

Article Robotics

TANDEM: Learning Joint Exploration and Decision Making With Tactile Sensors

Jingxi Xu, Shuran Song, Matei Ciocarlie

Summary: Inspired by human abilities, the robotic manipulation field aims to develop new methods for tactile-based object interaction. TANDEM, an architecture for learning efficient exploration strategies and decision making, is proposed in this study. The results show that TANDEM achieves higher accuracy with fewer actions in a tactile object recognition task and is more robust to sensor noise.

IEEE ROBOTICS AND AUTOMATION LETTERS (2022)

Add to Collection

Article Management

How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms

H. Henry Cao, Liye Ma, Z. Eddie Ning, Baohong Sun

Summary: In this paper, the authors use a continuous time bandit model to analyze the effectiveness of recommendation algorithms in a monopoly and duopoly market. They find that in a competitive market, firms focus more on exploitation rather than exploration. Additionally, competition decreases the return from developing a forward-looking algorithm for impatient users. However, the development of a forward-looking algorithm always benefits users in a competitive market. The decision of competing firms to invest in a forward-looking algorithm can create a prisoner's dilemma, highlighting the implications for artificial intelligence adoption and policy makers.

MANAGEMENT SCIENCE (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Multi-Armed Bandits for Minesweeper: Profiting From Exploration-Exploitation Synergy

Igor Q. Lordeiro, Diego B. Haddad, Douglas O. Cardoso

Summary: The research assessed the feasibility of using reinforcement learning and multi-armed bandit algorithms to tackle the problem presented by Minesweeper, showing successful results particularly in smaller game boards, such as the beginner level.

IEEE TRANSACTIONS ON GAMES (2022)

Add to Collection

Article Biology

Efficiency traps beyond the climate crisis: exploration-exploitation trade-offs and rebound effects

Jose Segovia-Martin, Felix Creutzig, James Winters

Summary: Higher levels of economic activity lead to higher energy use and consumption of natural resources, and the use of fossil fuels remains a significant contributor to greenhouse gas emissions and climate change. The Jevons Paradox suggests that increasing resource efficiency can actually lead to increased resource consumption. This study develops a mathematical model and computer simulator to analyze the effects of exploration-exploitation strategies on efficiency, consumption, and sustainability, and highlights the importance of demand reduction measures in achieving sustainable development goals.

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES (2023)

Add to Collection

Article Automation & Control Systems

SEM: Safe exploration mask for q-learning

Chengbin Xuan, Feng Zhang, Hak-Keung Lam

Summary: This paper presents a method to improve the safety of agents during the exploration stage in q-learning. By introducing a safety indicator function and a safe exploration mask, the algorithm reduces the likelihood of unsafe actions and improves its applicability in industrial settings.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2022)

Add to Collection

Article Neurosciences

Supporting generalization in non-human primate behavior by tapping into structural knowledge: Examples from sensorimotor mappings, inference, and decision-making

Jean-Paul Noel, Baptiste Caziot, Stefania Bruni, Nora E. Fitzgerald, Eric Avila, Dora E. Angelaki

Summary: The study emphasizes the importance of closed loops between action and perception in understanding complex behaviors, introducing the framework of reinforcement learning and control. It highlights active sensing, dynamical planning, and leveraging structural regularities as key operations for intelligent behavior. The approach allows for flexible and generalizable behaviors, while also exploring the neural underpinnings of intelligence properties such as flexibility, prediction, and generalization.

PROGRESS IN NEUROBIOLOGY (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Track-to-Learn: A general framework for tractography with deep reinforcement learning

Antoine Theberge, Christian Desrosiers, Maxime Descoteaux, Pierre-Marc Jodoin

Summary: Diffusion MRI tractography is the only non-invasive tool to assess the white-matter structural connectivity of a brain. Using deep reinforcement learning to address tractography issues has shown competitive results and stable performance when generalizing to new data.

MEDICAL IMAGE ANALYSIS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Perspective view of autonomous control in unknown environment: Dual control for exploitation and exploration vs reinforcement learning

Wen-Hua Chen

Summary: This paper discusses the relationship between Reinforcement Learning (RL) and the recently developed Dual Control for Exploitation and Exploration (DCEE), highlighting the potential of DCEE in solving similar problems as RL in unknown environments and its advantages in coping with uncertainty, learning efficiency, and potential to establish formal properties. The paper also explores the links between DCEE and other relevant methods, providing insights for cross fertilisation between control, machine learning, and neuroscience in developing autonomous control under uncertain environments.

NEUROCOMPUTING (2022)

Add to Collection

Article Computer Science, Information Systems

Automating post-exploitation with deep reinforcement learning

Ryusei Maeda, Mamoru Mimura

Summary: This paper proposes a method of automating post-exploitation by combining deep reinforcement learning and PowerShell Empire, with A2C showing the most efficient learning progress and the ability for trained agents to gain administrator privileges in a test domain network.

COMPUTERS & SECURITY (2021)

Add to Collection

No Data Available

No Data Available

© Peeref 2019-2024. All rights reserved.