☆ 4.6 Article

Toward growing modular deep neural networks for continuous speech recognition

NEURAL COMPUTING & APPLICATIONS (2017)

期刊

NEURAL COMPUTING & APPLICATIONS

卷 28, 期 -, 页码 S1177-S1196

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s00521-016-2438-x

关键词

Deep neural networks; Modular neural networks; Pre-training; Nonlinear filtering; Double spatiotemporal; Speaker adaptation; Continuous speech recognition

类别

Computer Science, Artificial Intelligence

资金

Iranian Cognitive Sciences and Technologies Council [209]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The performance drop of typical automatic speech recognition systems in real applications is related to their not properly designed structure and training procedure. In this article, a growing modular deep neural network (MDNN) for speech recognition is introduced. According to its structure, this network is pre-trained in a special manner. The ability of the MDNN to grow enables it to implement spatiotemporal information of the frame sequences at the input and their labels at the output layer at the same time. The trained network with such a double spatiotemporal (DST) structure has learned valid phonetic sequences subspace. Therefore, it can filter out invalid output sequences in its own structure. In order to improve the proposed network performance in speaker variations, two speaker adaptation methods are also presented in this work. In these adaptation methods, the network trains how to move distorted input representations nonlinearly to their optimal positions or to adapt itself based on the input information. To evaluate the proposed MDNN structure and its modified versions, two Persian speech datasets are used: FARSDAT and Large FARSDAT. As there is no frame-level transcription for large vocabulary speech datasets, a semi-supervised learning algorithm is explored to train MDNN on Large FARSDAT. Experimental results on FARSDAT verify that implementing the DST structure besides speaker adaptation methods achieves up to 7.3 and 10.6 % absolute phone accuracy rate improvement over the MDNN and typical hidden Markov model, respectively. Likewise, semi-supervised training of the grown MDNN on Large FARSDAT improves its recognition performance up to 5 %.

Toward growing modular deep neural networks for continuous speech recognition

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Toward growing modular deep neural networks for continuous speech recognition

期刊

NEURAL COMPUTING & APPLICATIONS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文