☆ 4.2 Article

Wise teachers train better DNN acoustic models

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING (2016)

期刊

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

卷 -, 期 -, 页码 1-19

出版社

SPRINGEROPEN

DOI: 10.1186/s13636-016-0088-7

关键词

Soft targets; Deep neural networks; Online speech recognition; Speaker-adaptive features; Model compression

类别

Acoustics Engineering, Electrical & Electronic

资金

Grants-in-Aid for Scientific Research [16H02845, 25280058] Funding Source: KAKEN

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can utilize speaker-adaptive features, have deep and wide layers, or more computationally expensive architectures, for example, often obtain best recognition accuracy but may not be suitable for the given budget of computational and storage resources or latency required by the deployed system. We explore a straightforward training approach which takes advantage of highly accurate but expensive-to-evaluate neural network acoustic models by using their outputs to relabel training examples for easier-to-deploy models. Experiments on a large vocabulary continuous speech recognition task offer relative reductions in word error rate of up to 16.7 % over training with the hard aligned labels by effectively making use of large amounts of additional untranscribed data. Somewhat remarkably, the approach works well even when only two output classes are present. Experiments on a voice activity detection task give relative reductions in equal error rate of up to 11.5 % when using a convolutional neural network to relabel training examples for a feedforward neural network. An investigation into the hidden layer weight matrices finds that soft target-trained networks tend to produce weight matrices having fuller rank and slower decay in singular values than their hard target-trained counterparts, suggesting that more of the network's capacity is utilized for learning additional information giving better accuracy.

Wise teachers train better DNN acoustic models

期刊

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

出版社

SPRINGEROPEN

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Wise teachers train better DNN acoustic models

期刊

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

出版社

SPRINGEROPEN

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文