Floris den Hengst

Netherlands Vrije Universtiteit Amsterdam

Published in 2022
Reinforcement Learning with Option Machines
Deep Reinforcement Learning Curriculum Learning Hierarchical Reinforcement Learning Planning with Incomplete Information
Authors: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn and Frank van Harmelen
Journal: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Description:
Reinforcement learning (RL) is a powerful framework for learning complex behaviors, but lacks adoption in many settings due to sample size requirements. We introduce a framework for increasing sample efficiency of RL algorithms. Our approach focuses on optimizing environment rewards with high-level instructions. These are modeled as a high-level controller over temporally extended actions known as options. These options can be looped, interleaved and partially ordered with a rich language for high-level instructions. Crucially, the instructions may be underspecified in the sense that following them does not guarantee high reward in the environment. We present an algorithm for control with these so-called option machines (OMs), discuss option selection for the partially ordered case and describe an algorithm for learning with OMs. We compare our approach in zero-shot, single- and multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings.
ORCID
Published in 2022
Planning for potential: efficient safe reinforcement learning
Authors: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn, Frank van Harmelen
Journal: Machine Learning
Published in 2020
Collecting High-Quality Dialogue User Satisfaction Ratings with Third-Party Annotators
natural language interfaces information retrieval dialogue systems evaluation
Authors: Mickey van Zeelt, Floris den Hengst and Seyyed Hadi Hashemi
Journal: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval
Description:
The design, evaluation and adaptation of conversational information systems are typically guided by ratings from third-party, i.e. non-user, annotators. Interfaces used in gathering such ratings are designed in an ad-hoc fashion as it has not yet been investigated which design yields high-quality ratings. This work describes how to design user interfaces for gathering high-quality ratings with third-party annotators. In a user study, we compare a base interface that consolidates best practices from literature, an interface with clear definitions and an interface in which tasks are separated visually. We find that these interfaces yield annotations of high quality and separation of tasks. We find no significant improvements in quality between UIs. This work can serve as a starting point for researchers and practitioners interested in collecting high-quality dialogue user satisfaction ratings using third-party annotators.
ORCID
Published in 2020
Reinforcement learning for personalization: A systematic literature review
Authors: -
Journal: Data Science
Published in 2019
Reinforcement Learning for Personalized Dialogue Management
Reinforcement Learning Dialogue Management Personalization Recommendation
Authors: Floris den Hengst, Markk Hoogendoorn, Frank van Harmelen and Joost Bosman
Journal: IEEE/WIC/ACM International Conference on Web Intelligence on - WI '19
Description:
Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.