4.6 Article

From Feedforward to Recurrent LSTM Neural Networks for Language Modeling

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASLP.2015.2400218

关键词

Feedforward neural network; Kneser-Ney smoothing; language modeling; long short-term memory (LSTM); recurrent neural network (RNN)

资金

  1. OSEO, French State agency for innovation
  2. Quaero programme
  3. European Union [287658, 287755]
  4. DIGITEO, a French research cluster in Ile-de-France
  5. JARA-HPC from RWTH Aachen University [jara0085]

向作者/读者索取更多资源

Language models have traditionally been estimated based on relative frequencies, using count statistics that can be extracted from huge amounts of text data. More recently, it has been found that neural networks are particularly powerful at estimating probability distributions over word sequences, giving substantial improvements over state-of-the-art count models. However, the performance of neural network language models strongly depends on their architectural structure. This paper compares count models to feedforward, recurrent, and long short-term memory (LSTM) neural network variants on two large-vocabulary speech recognition tasks. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. Furthermore, neural networks incur an increased computational complexity compared to count models, and they differently model context dependences, often exceeding the number of words that are taken into account by count based approaches. These differences require efficient search methods for neural networks, and we analyze the potential improvements that can be obtained when applying advanced algorithms to the rescoring of word lattices on large-scale setups.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Proceedings Paper Acoustics

CONFORMER-BASED HYBRID ASR SYSTEM FOR SWITCHBOARD DATASET

Mohammad Zeineldeen, Jingjing Xu, Christoph Luescher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlueter, Hermann Ney

Summary: The paper investigates the application of a Conformer-based hybrid model in ASR, exploring different training aspects and methods to improve performance. The results show competitive performance and significant improvement compared to other architectures.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Acoustics

ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION

Wei Zhou, Zuoyun Zheng, Ralf Schlueter, Hermann Ney

Summary: This paper studies the LM integration methods based on ILM correction in the RNN-T framework. A decoding interpretation is provided, and two reasons for performance improvement with ILM correction are experimentally verified. The proposed exact-ILM training framework is also introduced to theoretically justify other ILM approaches.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Acoustics

IMPROVING FACTORED HYBRID HMM ACOUSTIC MODELING WITHOUT STATE TYING

Tina Raissi, Eugen Beck, Ralf Schlueter, Hermann Ney

Summary: This work introduces a factored hybrid hidden Markov model (FH-HMM) that outperforms state-of-the-art hybrid HMM without phonetic state-tying. The FH-HMM links to transducer models in how it models phonetic context while maintaining the separation of acoustic and language model components. It can be trained from scratch without using phonetic state-tying and enables triphone context while avoiding phonetic state-tying.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Acoustics

EFFICIENT SEQUENCE TRAINING OF ATTENTION MODELS USING APPROXIMATIVE RECOMBINATION

Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schluter, Hermann Ney

Summary: Sequence discriminative training is an important tool for improving the performance of automatic speech recognition systems. However, the computation of sum over all possible word sequences is impractical. Current state-of-the-art systems overcome this problem by limiting the summation to a selected number of relevant hypotheses obtained from beam search. This study proposes an approximate method of hypothesis recombination during beam search, which allows for a significant increase in the effective beam size without increasing computational requirements.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Proceedings Paper Engineering, Biomedical

Discrete Steps towards Approximate Computing

Michael Gansen, Jie Lou, Florian Freye, Tobias Gemmeke, Farhad Merchant, Albert Zeyer, Mohammad Zeineldeen, Ralf Schlueter, Xin Fan

Summary: This paper presents recent studies on digital approximate computing, exploring discrete approximation using floating-point number representations and addressing time-domain computing. The proposed approximate arithmetic and nonlinear activation functions achieve competitive Quality-of-Service compared to full-precision computing in various artificial neural networks.

PROCEEDINGS OF THE TWENTY THIRD INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2022) (2022)

Proceedings Paper Audiology & Speech-Language Pathology

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlueter, Hermann Ney

Summary: This study focuses on the implicit internal language model (ILM) within attention-based encoder-decoder models, proposing various methods to estimate ILM directly and surpassing previous approaches. Additionally, other methods to suppress ILM are explored, such as reducing model capacity, limiting label context, and training the model with an existing LM simultaneously.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Librispeech Transducer Model with Internal Language Model Prior Correction

Albert Zeyer, Andre Merboldt, Wilfried Michel, Ralf Schluter, Hermann Ney

Summary: Our study on the transducer model suggests that subtracting the estimated internal LM can lead to over 14% relative improvement over normal shallow fusion. The model has a separate probability distribution for non-blank labels, making it easier to combine with external LM and estimate the internal LM. Additionally, the inclusion of the end-of-sentence (EOS) probability of the external LM in the last blank probability further improves performance.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlueter, Hermann Ney

Summary: This paper introduces an acoustic data-driven subword modeling approach that produces labels suitable for various ASR models. Experimental results demonstrate that this approach outperforms traditional BPE and PASM methods in terms of performance.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

Wei Zhou, Albert Zeyer, Andre Merboldt, Ralf Schlueter, Hermann Ney

Summary: With the introduction of direct models in automatic speech recognition, the traditional frame-wise acoustic modeling based on hidden Markov models has diversified into various architectures. This work proves the equivalence of RNN-Transducer models and segmental models, providing initial experiments on decoding and beam-pruning using the same underlying model.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlueter

Summary: This study focuses on the impact of using state-of-the-art ASR system on subsequent automatic analysis of linguistic complexity in spontaneously produced L2 speech. Through correlation analysis and controlling for task type effects, a more differential effect of ASR performance on specific types of complexity measures is presented.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

On Sampling-Based Training Criteria for Neural Language Modeling

Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlueter, Hermann Ney

Summary: With the increasing vocabulary size of language models, many sampling-based training criteria have been proposed and investigated, simplifying softmax-related traversal over the entire vocabulary for speedups. Contrary to common belief, experimental results show that all these sampling methods can perform equally well when correcting for intended class posterior probabilities.

INTERSPEECH 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

COMPARING THE BENEFIT OF SYNTHETIC TRAINING DATA FOR VARIOUS AUTOMATIC SPEECH RECOGNITION ARCHITECTURES

Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlueter, Hermann Ney

Summary: Recent publications in automatic speech recognition focus on attention encoder-decoder architectures and the use of synthetic data generated by TTS systems. The effectiveness of synthesized data for AED-ASR training is influenced by various factors such as pre-processing, speaker embedding, and internal language model subtraction. Hybrid ASR systems outperform AED systems on LibriSpeech-100h, achieving a lower word error rate on clean/noisy test sets.

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

ON ARCHITECTURES AND TRAINING FOR RAW WAVEFORM FEATURE EXTRACTION IN ASR

Peter Vieting, Christoph Luescher, Wilfried Michel, Ralf Schlueter, Hermann Ney

Summary: This study investigates acoustic modeling and feature extractors learning, explores the usefulness of unsupervised pre-training feature extractors in ASR systems, compares the performance of different feature sets, and discusses how to further improve the system performance.

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) (2021)

Proceedings Paper Acoustics

PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION

Wei Zhou, Simon Berger, Ralf Schlueter, Hermann Ney

Summary: This study presents a phoneme-based neural transducer modeling approach that combines the advantages of classical and end-to-end methods by improving alignment label topologies, enhancing phoneme labels, and utilizing local phonetic dependencies along with external language models for sequence-to-sequence modeling consistency. The training procedure using frame-wise cross-entropy loss and a phonetic context size of one are shown to be efficient for achieving optimal performance.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

TIGHT INTEGRATED END-TO-END TRAINING FOR CASCADED SPEECH TRANSLATION

Parnia Bahar, Tobias Bieschke, Ralf Schlueter, Hermann Ney

Summary: This study explores the feasibility of collapsing cascaded speech translation models into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly. Experimental results show that the model outperforms cascade models and direct models, achieving higher performance in terms of BLEU and TER.

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) (2021)

暂无数据