Journal
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Volume 27, Issue 3, Pages 572-582Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASLP.2018.2888814
Keywords
Speech recognition; RNNLM; language model adaptation; multi-domain ASR
Categories
Funding
- EPSRC [EP/I031022/1]
- EPSRC [EP/I031022/1] Funding Source: UKRI
Ask authors/readers for more resources
Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when used in automatic speech recognition (ASR). Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based or model based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network adaptation layer and the K-component adaptive the RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for the RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700 h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this paper are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660 M words, with approx. 10% perplexity and 2% relative word error rate improvements on a 28.3 h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F-measure of 0.5%.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available