4.5 Article

Evaluation of a hierarchical reinforcement learning spoken dialogue system

Journal

COMPUTER SPEECH AND LANGUAGE
Volume 24, Issue 2, Pages 395-429

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2009.07.001

Keywords

Spoken dialogue systems; Hierarchical reinforcement learning; Human-machine dialogue simulation; Dialogue strategies; System evaluation

Ask authors/readers for more resources

We describe an evaluation of spoken dialogue strategies designed using hierarchical reinforcement learning agents. The dialogue strategies were learnt in a simulated environment and tested in a laboratory setting with 32 users. These dialogues were used to evaluate three types of machine dialogue behaviour: hand-coded, fully-learnt and semi-learnt. These experiments also served to evaluate the realism of simulated dialogues using two proposed metrics contrasted with 'Precision-Recall'. The learnt dialogue behaviours used the Semi-Markov Decision Process (SMDP) model, and we report the first evaluation of this model in a realistic conversational environment. Experimental results in the travel planning domain provide evidence to support the following claims: (a) hierarchical semi-learnt dialogue agents are a better alternative (with higher overall performance) than deterministic or fully-learnt behaviour; (b) spoken dialogue strategies learnt with highly coherent user behaviour and conservative recognition error rates (keyword error rate of 20%) can outperform a reasonable hand-coded strategy; and (c) hierarchical reinforcement learning dialogue agents are feasible and promising for the (semi) automatic design of optimized dialogue behaviours in larger-scale systems. (C) 2009 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

A data-efficient deep learning approach for deployable multimodal social robots

Heriberto Cuayahuitl

NEUROCOMPUTING (2020)

Article Computer Science, Artificial Intelligence

Conversational AI for multi-agent communication in Natural Language Research directions at the Interaction Lab

Oliver Lemon

Summary: Research at the Interaction Lab focuses on human-agent communication using conversational Natural Language. The goal is to create systems where humans and AI agents can form teams and coordinate tasks through Natural Language conversation. This paper introduces machine learning approaches to conversational AI and covers practical systems developed in the lab, including communication between multiple agents. It also discusses future directions for conversational, collaborative multi-agent systems.

AI COMMUNICATIONS (2022)

Article Acoustics

Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform

Erfan Loweimi, Zhengjun Yue, Peter Bell, Steve Renals, Zoran Cvetkovic

Summary: In this paper, the authors investigate multi-stream acoustic modelling using the raw real and imaginary parts of the Fourier transform of speech signals. They discuss the importance of such information and propose a framework where the real and imaginary parts are treated as separate streams and combined at an optimal level of abstraction. The proposed systems achieved competitive performance in various tasks, including phone recognition, noise robustness, and speech intelligibility.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

Proceedings Paper Engineering, Electrical & Electronic

Multi-party Interaction with a Robot Receptionist

Meriam Moujahid, Helen Hastie, Oliver Lemon

Summary: This study introduces a multi-user engagement strategy that utilizes the robot's gaze, head pose, and verbal communication to coordinate turn-taking and analyzes the participants' perception of the robot. The results confirm that the robot is perceived as more intelligent and conscious when it reacts using eye gaze or head pose when a new user enters the scene. Furthermore, it is found that robots need to use a combination of verbal and non-verbal cues to coordinate turn-taking in order to be perceived as polite and aware of human social norms.

PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22) (2022)

Proceedings Paper Engineering, Electrical & Electronic

Demonstration of a Robot Receptionist with Multi-party Situated Interaction

Meriam Moujahid, Bruce Wilson, Helen Hastie, Oliver Lemon

Summary: The demonstration showcases a Robot Receptionist that can handle multi-party engagement and turn-taking in dynamic environments. Utilizing a highly expressive Furhat robot, the system consists of several modules including scene analysis, engagement policies, and a dialogue manager.

PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22) (2022)

Proceedings Paper Computer Science, Cybernetics

Developing a Social Conversational Robot for the Hospital waiting room

Nancie Gunson, Daniel Hernandez Garcia, Weronika Sieinska, Christian Dondrup, Oliver Lemon

Summary: This paper describes the potential applications of social robots in healthcare settings, such as robot receptionists, to assist patients and visitors and alleviate staff workload. It presents the development of a multimodal conversational AI system integrated in a social conversational robot (ARI robot) and reports on an initial experimental validation study conducted with the ARI robot in laboratory conditions.

2022 31ST IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN 2022) (2022)

Article Acoustics

Towards Robust Waveform-Based Acoustic Models

Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu

Summary: This study focuses on the problem of learning robust acoustic models in adverse environments. The authors propose using data augmentation as a way to improve risk estimates during training and demonstrate its effectiveness through theoretical analysis and empirical results.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2022)

Proceedings Paper Computer Science, Artificial Intelligence

An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

Alessandro Suglia, Yonatan Bisk, Ioannis Konstas, Antonio Vergari, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

Summary: Guessing games serve as a prototypical example of the learning by interacting paradigm, and this research investigates how artificial agents can benefit from playing such games in the context of NLP tasks. The study proposes two methods, supervised learning and self-play via SPIEL, to exploit guessing games, and evaluates their generalization ability to improve performance in downstream NLP tasks. The results show increased accuracy in both in-domain and transfer evaluations, with SPIEL leading to more fine-grained object representations for improved performance in VQA.

16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Towards Visual Dialogue for Human-Robot Interaction

Jose L. Part, Daniel Hernandez Garcia, Yanchao Yu, Nancie Gunson, Christian Dondrup, Oliver Lemon

Summary: The goal of the SPRING project is to develop a socially pertinent robot for tasks in gerontological healthcare. The robot must be able to perceive its environment and have coherent conversations about the surrounding world. The described work has applications beyond healthcare and can be used on any robot that needs to interact with its visual and spatial environment.

HRI '21: COMPANION OF THE 2021 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (2021)

Proceedings Paper Computer Science, Artificial Intelligence

It's Good to Chat? Evaluation and Design Guidelines for Combining Open-Domain Social Conversation with Task-Based Dialogue in Intelligent Buildings

Nancie Gunson, Weronika Sieinska, Christopher Walsh, Christian Dondrup, Oliver Lemon

PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (ACM IVA 2020) (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Conversational Agents for Intelligent Buildings

Weronika Sieinska, Nancie Gunson, Christopher Walsh, Christian Dondrup, Oliver Lemon

SIGDIAL 2020: 21ST ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2020) (2020)

Proceedings Paper Computer Science, Interdisciplinary Applications

Hierarchical Multi-Task Natural Language Understanding for Cross-domain Conversational AI: HERMIT NLU

Andrea Vanzo, Emanuele Bastianelli, Oliver Lemon

20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019) (2019)

Proceedings Paper Computer Science, Interdisciplinary Applications

Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach

Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon

20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

Towards a Robot Architecture for Situated Lifelong Object Learning

Jose L. Part, Oliver Lemon

2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) (2019)

Proceedings Paper Acoustics

WINDOWED ATTENTION MECHANISMS FOR SPEECH RECOGNITION

Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2019)

Article Computer Science, Artificial Intelligence

M2A: A model-agnostic and metadata-free adversarial framework for unsupervised opinion summarization

Yanyue Zhang, Deyu Zhou, Zhenglin Wang, Yilong Lai

Summary: This paper proposes an unsupervised opinion summarization method that addresses the problem of generating inaccurate content through adversarial learning, without requiring specific model structures or domain metadata. By appending natural language inference as the discriminator to the generation model and retraining the discriminator for unsupervised contrastive learning, the model achieves model-agnostic and metadata-free performance. Experimental results demonstrate that the proposed method generates comparable results on ROUGE scores and outperforms state-of-the-art baselines in category accuracy and sentiment accuracy for summarization faithfulness evaluation.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

Though this be hesitant, yet there is method in 't: Effects of disfluency patterns in neural speech synthesis for cultural heritage presentations

Loredana Schettino, Antonio Origlia, Francesco Cutugno

Summary: This study presents the results of two perception experiments that evaluate the impact of specific patterns of disfluencies on listeners of synthetic speech. Focusing on Cultural Heritage presentations, the study proposes a linguistic model for positioning disfluencies in Italian language utterances. Utilizing a state-of-the-art speech synthesizer based on Deep Neural Networks, the study prepares experimental stimuli and conducts subjective evaluations and behavioral assessments. The results indicate that synthetic utterances with predicted disfluencies are perceived as more natural and improve the listeners' recall of the provided information.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

A lightweight approach based on prompt for few-shot relation extraction

Ying Zhang, Wencheng Huang, Depeng Dang

Summary: This paper introduces a lightweight approach to address the problem of few-shot relation extraction, using prompt-learning to assist in fine-tuning the model and designing an enhanced fusion module to fuse relation information and original prototype. Experimental results show that the proposed method achieves state-of-the-art performance on common datasets.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

The limits of the Mean Opinion Score for speech synthesis evaluation

Sebastien Le Maguer, Simon King, Naomi Harte

Summary: The release of WaveNet and Tacotron has greatly impacted the speech synthesis field by significantly improving the quality of synthetic speech. However, the current evaluation protocol, Absolute Category Rating (ACR) and Mean Opinion Score (MOS), used to measure this quality, has sparked controversy. To determine the reliability of MOS, a series of experiments were conducted, examining the stability of MOS over time, the influence of lower quality systems on MOS, the influence of modern technologies on past system scores, and the evolution of MOS for modern technologies in isolation. The results suggest the need for new evaluation protocols better suited for analyzing modern speech synthesis technologies.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

Dual Knowledge Distillation for neural machine translation

Yuxian Wan, Wenlin Zhang, Zhen Li, Hao Zhang, Yanxia Li

Summary: In this paper, a new knowledge distillation method called Dual Knowledge Distillation (DKD) is proposed to better utilize monolingual and limited bilingual data. By combining self-distillation and consistency regularization strategies, significant improvements are achieved in extracting consistent monolingual representation and forcing the decoder to produce consistent output.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

Predicting children's perceived reading proficiency with prosody modeling

Kamini Sabu, Preeti Rao

Summary: Reading is a foundational skill that is given great importance in education systems across countries. The assessment of linguistic competence through oral reading has been the focus of scientific studies, connecting the reader's comprehension to various measures of oral reading fluency. As this assessment requires significant time and resources, there is interest in automating the prediction of reading fluency using the same pedagogical rubrics. This study discusses new approaches to modeling prosody for automatic assessment, highlighting the importance of prosodic features informed by speech rate and speaking style in system performance.

COMPUTER SPEECH AND LANGUAGE (2024)

Article Computer Science, Artificial Intelligence

Training RNN language models on uncertain ASR hypotheses in limited data scenarios

Imran Sheikh, Emmanuel Vincent, Irina Illina

Summary: This article studies the training and adaptation of recurrent neural network (RNN) language models (LM) on a limited amount of in-domain speech data. It proposes training loss methods based on Kullback-Leibler (KL) divergence, hidden Markov model (HMM), and sampled paths from ASR confusion networks. Experimental results on telephone and meeting conversation datasets show that the sampling method for training RNN LMs on ASR confusion networks performs the best and leads to a relative reduction in perplexity.

COMPUTER SPEECH AND LANGUAGE (2024)