Floris den Hengst 的个人资料, Vrije Universtiteit Amsterdam

Netherlands Vrije Universtiteit Amsterdam

个人简介: My research interest is including symbolic formalisms into Reinforcement Learning for safety and efficiency as well as applications of Reinforcement Learning in the real world.

Reinforcement Learning with Option Machines: Deep Reinforcement Learning Curriculum Learning Hierarchical Reinforcement Learning Planning with Incomplete Information; 作者: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn and Frank van Harmelen; 发表期刊: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

论文简介:: Reinforcement learning (RL) is a powerful framework for learning complex behaviors, but lacks adoption in many settings due to sample size requirements. We introduce a framework for increasing sample efficiency of RL algorithms. Our approach focuses on optimizing environment rewards with high-level instructions. These are modeled as a high-level controller over temporally extended actions known as options. These options can be looped, interleaved and partially ordered with a rich language for high-level instructions. Crucially, the instructions may be underspecified in the sense that following them does not guarantee high reward in the environment. We present an algorithm for control with these so-called option machines (OMs), discuss option selection for the partially ordered case and describe an algorithm for learning with OMs. We compare our approach in zero-shot, single- and multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings.

ORCID

Planning for potential: efficient safe reinforcement learning: 作者: Floris den Hengst, Vincent François-Lavet, Mark Hoogendoorn, Frank van Harmelen; 发表期刊: Machine Learning

Collecting High-Quality Dialogue User Satisfaction Ratings with Third-Party Annotators: natural language interfaces information retrieval dialogue systems evaluation; 作者: Mickey van Zeelt, Floris den Hengst and Seyyed Hadi Hashemi; 发表期刊: Proceedings of the 2020 Conference on Human Information Interaction and Retrieval

论文简介:: The design, evaluation and adaptation of conversational information systems are typically guided by ratings from third-party, i.e. non-user, annotators. Interfaces used in gathering such ratings are designed in an ad-hoc fashion as it has not yet been investigated which design yields high-quality ratings. This work describes how to design user interfaces for gathering high-quality ratings with third-party annotators. In a user study, we compare a base interface that consolidates best practices from literature, an interface with clear definitions and an interface in which tasks are separated visually. We find that these interfaces yield annotations of high quality and separation of tasks. We find no significant improvements in quality between UIs. This work can serve as a starting point for researchers and practitioners interested in collecting high-quality dialogue user satisfaction ratings using third-party annotators.

ORCID

Reinforcement learning for personalization: A systematic literature review: 作者: -; 发表期刊: Data Science

Reinforcement Learning for Personalized Dialogue Management: Reinforcement Learning Dialogue Management Personalization Recommendation; 作者: Floris den Hengst, Markk Hoogendoorn, Frank van Harmelen and Joost Bosman; 发表期刊: IEEE/WIC/ACM International Conference on Web Intelligence on - WI '19

论文简介:: Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not generalize the usage of context across users. This work introduces a problem where a generalized usage of context is relevant and proposes two Reinforcement Learning (RL)-based approaches to this problem. The first approach uses a single learner and extends the traditional POMDP formulation of dialogue state with features that describe the user context. The second approach segments users by context and then employs a learner per context. We compare these approaches in a benchmark of existing non-RL and RL-based methods in three established and one novel application domain of financial product recommendation. We compare the influence of context and training experiences on performance and find that learning approaches generally outperform a handcrafted gold standard.