Floris den Hengst

Netherlands Vrije Universtiteit Amsterdam

2022年发表
Reinforcement Learning with Option Machines
Deep Reinforcement Learning Curriculum Learning Hierarchical Reinforcement Learning Planning with Incomplete Information
作者: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn and Frank van Harmelen
发表期刊: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
论文简介:
Reinforcement learning (RL) is a powerful framework for learning complex behaviors, but lacks adoption in many settings due to sample size requirements. We introduce a framework for increasing sample efficiency of RL algorithms. Our approach focuses on optimizing environment rewards with high-level instructions. These are modeled as a high-level controller over temporally extended actions known as options. These options can be looped, interleaved and partially ordered with a rich language for high-level instructions. Crucially, the instructions may be underspecified in the sense that following them does not guarantee high reward in the environment. We present an algorithm for control with these so-called option machines (OMs), discuss option selection for the partially ordered case and describe an algorithm for learning with OMs. We compare our approach in zero-shot, single- and multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings.
ORCID
2022年发表
Planning for potential: efficient safe reinforcement learning
作者: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn, Frank van Harmelen
发表期刊: Machine Learning
2020年发表
Collecting High-Quality Dialogue User Satisfaction Ratings with Third-Party Annotators
natural language interfaces information retrieval dialogue systems evaluation
作者: Mickey van Zeelt, Floris den Hengst and Seyyed Hadi Hashemi
发表期刊: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval
论文简介:
The design, evaluation and adaptation of conversational information systems are typically guided by ratings from third-party, i.e. non-user, annotators. Interfaces used in gathering such ratings are designed in an ad-hoc fashion as it has not yet been investigated which design yields high-quality ratings. This work describes how to design user interfaces for gathering high-quality ratings with third-party annotators. In a user study, we compare a base interface that consolidates best practices from literature, an interface with clear definitions and an interface in which tasks are separated visually. We find that these interfaces yield annotations of high quality and separation of tasks. We find no significant improvements in quality between UIs. This work can serve as a starting point for researchers and practitioners interested in collecting high-quality dialogue user satisfaction ratings using third-party annotators.
ORCID
2020年发表
Reinforcement learning for personalization: A systematic literature review
作者: -
发表期刊: Data Science
2019年发表
Reinforcement Learning for Personalized Dialogue Management
Reinforcement Learning Dialogue Management Personalization Recommendation
作者: Floris den Hengst, Markk Hoogendoorn, Frank van Harmelen and Joost Bosman
发表期刊: IEEE/WIC/ACM International Conference on Web Intelligence on - WI '19
论文简介:
Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.