4.5 Article

Deep Elman recurrent neural networks for statistical parametric speech synthesis

Journal

SPEECH COMMUNICATION
Volume 93, Issue -, Pages 31-42

Publisher

ELSEVIER
DOI: 10.1016/j.specom.2017.08.003

Keywords

Speech synthesis; Recurrent neural networks; Deep neural networks; Hidden state

Funding

  1. TCS

Ask authors/readers for more resources

Owing to the success of deep learning techniques in automatic speech recognition, deep neural networks (DNNs) have been used as acoustic models for statistical parametric speech synthesis (SPSS). DNNs do not inherently model the temporal structure in speech and text, and hence are not well suited to be directly applied to the problem of SPSS. Recurrent neural networks (RNN) on the other hand have the capability to model time-series. RNNs with long short-term memory (LSTM) cells have been shown to outperform DNN based SPSS. However, LSTM cells and its variants like gated recurrent units (GRU), simplified LSTMs (SLSTM) have complicated structure and are computationally expensive compared to the simple recurrent architecture like Elman RNN. In this paper, we explore deep Elman RNNs for SPSS and compare their effectiveness against deep gated RNNs. Specifically, we perform experiments to show that (1) Deep Elman RNNs are better suited for acoustic modeling in SPSS when compared to DNNs and perform competitively to deep SLSTMs, GRUs and LSTMs, (2) Context representation learning using Elman RNNs improves neural network acoustic models for SPSS, and (3) Elman RNN based duration model is better than the DNN based counterpart. Experiments were performed on Blizzard Challenge 2015 dataset consisting of 3 Indian languages (Telugu, Hindi and Tamil). Through subjective and objective evaluations, we show that our proposed systems outperform the baseline systems across different speakers and languages. (C) 2017 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available