☆ 4.7 Article

Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (2010)

Journal

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING

Volume 4, Issue 2, Pages 272-287

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JSTSP.2009.2039171

Keywords

Automatic speech recognition (ASR); compressive sensing (CS); missing data techniques; noise robustness

Categories

Engineering, Electrical & Electronic

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable ( missing), and to replace ( impute) the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low signal-to-noise ratios (SNRs), these techniques fail, because too many time frames may contain few, if any, reliable features. In this paper, we introduce a novel non-parametric, exemplar-based method for reconstructing clean speech from noisy observations, based on techniques from the field of Compressive Sensing. The method, dubbed sparse imputation, can impute missing features using larger time windows such as entire words. Using an overcomplete dictionary of clean speech exemplars, the method finds the sparsest combination of exemplars that jointly approximate the reliable features of a noisy utterance. That linear combination of clean speech exemplars is used to replace the missing features. Recognition experiments on noisy isolated digits show that sparse imputation outperforms conventional imputation techniques at SNR = - dB when using an ideal 'oracle' mask. With error-prone estimated masks sparse imputation performs slightly worse than the best conventional technique.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Engineering, Electrical & Electronic

Far-Field Automatic Speech Recognition

Reinhold Haeb-Umbach, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc Delcroix, Tomohiro Nakatani

Summary: Far-field automatic speech recognition (ASR) has gained significant attention and application in science and industry, with consumer market adoption for digital home assistants. Signal enhancement and robust ASR engine are key to improving recognition accuracy, with a combination of deep learning and traditional signal processing proving to be an effective solution.

PROCEEDINGS OF THE IEEE (2021)

Add to Collection

Review Computer Science, Information Systems

Automatic Speech Recognition: Systematic Literature Review

Sadeen Alharbi, Muna Alrazgan, Alanoud Alrashed, Turkiayh Alnomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, Maha Almojil

Summary: Recent years have witnessed a significant amount of research in the field of speech signal processing, with a particular focus on automatic speech recognition (ASR) technology. This systematic review aims to summarize the most significant topics published in the past six years and identify major ASR challenges in real-world environments. Additionally, the review discusses current research gaps in ASR and suggests new research directions.

IEEE ACCESS (2021)

Add to Collection

Article Computer Science, Information Systems

Automatic speech recognition: a survey

Mishaim Malik, Muhammad Kamran Malik, Khawar Mehmood, Imran Makhdoom

Summary: This study provides a thorough comparison of cutting-edge techniques in automatic speech recognition (ASR), focusing on various deep learning methods and their impact on ASR performance. It also delves into different speech datasets and online toolkits that can aid in ASR research, serving as a valuable starting point for academics interested in this area.

MULTIMEDIA TOOLS AND APPLICATIONS (2021)

Add to Collection

Review Computer Science, Interdisciplinary Applications

Automatic Speech Recognition System for Tonal Languages: State-of-the-Art Survey

Jaspreet Kaur, Amitoj Singh, Virender Kadyan

Summary: This paper conducts a systematic survey on Automatic Speech Recognition (ASR) for tonal languages spoken globally, focusing mainly on Asian, Indo-European, and African tonal languages. It is found that while there has been extensive research on Asian tonal languages, there is limited research reported on Indo-European and African tonal languages.

ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING (2021)

Add to Collection

Article Engineering, Electrical & Electronic

Compressed Sensing-Based Robust Phase Retrieval via Deep Generative Priors

Fahad Shamshad, Ali Ahmed

Summary: This article introduces an alternative method for recovering the phase of optical images using algorithmic phase retrieval, showing effectiveness in optical imaging setup and improvement over conventional hand-engineered priors. By modifying the algorithm to explore solutions outside the range of the generative model, the proposed approach achieves improved performance. The effectiveness of the proposed approach is verified through a real application of multiple scattering media imaging.

IEEE SENSORS JOURNAL (2021)

Add to Collection

Review Acoustics

Unsupervised Automatic Speech Recognition: A review

Hanan Aldarmaki, Asad Ullah, Sreepratha Ram, Nazar Zaki

Summary: This paper reviews the research literature to examine the challenges and potential solutions for achieving fully unsupervised ASR, with the aim of optimizing ASR development for low-resource languages.

SPEECH COMMUNICATION (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition

Ahmad Almadhor, Rizwana Irfan, Jiechao Gao, Nasir Saleem, Hafiz Tayyab Rauf, Seifedine Kadry

Summary: Dysarthria is a speech disability caused by weak muscles and organs involved in articulation, affecting speech intelligibility. This paper proposes a visual dysarthric ASR system using SCNN and MHAT to overcome speech challenges. The DASR system outperformed other systems, improving recognition accuracy for the UA-Speech database by 20.72%, with the largest improvements seen in very-low (25.75%) and low intelligibility (33.67%) cases.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Discovering phonetic inventories with crosslingual automatic speech recognition

Piotr Zelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

Summary: This paper investigates how to build the phone inventory of an unseen language in an unsupervised way and analyzes the influences of different factors. It also discovers some universal phone tokens and identifies the challenges in phonetic inventory discovery.

COMPUTER SPEECH AND LANGUAGE (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Noise-Separated Adaptive Feature Distillation for Robust Speech Recognition

Honglin Qu, Xiangdong Su, Yonghe Wang, Xiang Hao, Guanglai Gao

Summary: This letter presents an improvement on feature-based knowledge distillation for robust speech recognition. By using distillation techniques, the robustness of the system has been enhanced. The proposed method includes an adaptive distillation position selection strategy and a noise separation mechanism, assuming a common network structure between the student and teacher. Two improvements are introduced in this method: adaptive selection of distillation positions based on loss values and explicit separation of speech and noise information using a noise separation module. The proposed method outperforms the standardized feature-based knowledge distillation method in terms of recognition performance.

IEEE SIGNAL PROCESSING LETTERS (2023)

Add to Collection

Article Chemistry, Multidisciplinary

Orthogonalization of the Sensing Matrix Through Dominant Columns in Compressive Sensing for Speech Enhancement

Vasundhara Shukla, Preety D. D. Swami

Summary: This paper presents a novel speech enhancement approach called DCGOSM in compressive sensing (CS) using particle swarm optimization (PSO) to optimize the sensing matrix for separate basis vectors of speech and noise signals. It achieves lower noise in the reconstructed signal by avoiding noise components through orthogonal matching pursuit (OMP)-based CS signal reconstruction with the optimized matrix. DCGOSM outperforms other OMP-based CS algorithms and DNN-based speech enhancement techniques, demonstrating significant improvements in SNR, SSNR, PESQ, and STOI, as well as reducing recovery time.

APPLIED SCIENCES-BASEL (2023)

Add to Collection

Article Computer Science, Information Systems

Automatic speech recognition systems: A survey of discriminative techniques

Amrit Preet Kaur, Amitoj Singh, Rohit Sachdeva, Vinay Kukreja

Summary: This study provides a detailed assessment of voice recognition strategies for different languages, highlighting the lack of standard speech corpus for minority languages. It also explores hybrid acoustic modeling methods to improve the efficiency of voice recognition.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Add to Collection

Article Computer Science, Information Systems

Amalgamation of noise elimination and TDNN acoustic modelling techniques for the advancements in continuous Kannada ASR system

G. Thimmaraja Yadava, B. G. Nagaraja, H. S. Jayanna

Summary: This work demonstrates enhancements made to a previous continuous Kannada automatic speech recognition (ASR) spoken query system (SQS) in a real-time environment. The proposed background noise suppression block improves the audibility and intelligibility of speech sound compared to conventional methods. By combining noise reduction and time delay neural network (TDNN) acoustic modeling techniques, there is a 1.87% improvement in word error rate (WER) compared to the previous SQS. The online testing statistics of the newly developed continuous Kannada ASR system are also presented in this study.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Add to Collection

Article Acoustics

A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition

Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai

Summary: This paper proposes a novel self-supervised pre-training framework that incorporates speech enhancement to improve automatic speech recognition (ASR) performance in noisy environments. In the pre-training phase, the original noisy waveform or the waveform obtained by speech enhancement is fed into the self-supervised model to learn the contextual representation, with quantized clean speech as the target. Additionally, a dual-attention fusion method is proposed to combine the features of noisy and enhanced speech, compensating for information loss from individual modules.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

Add to Collection

Article Automation & Control Systems

Compressive Sensing-Based Missing-Data-Tolerant Fault Detection for Remote Condition Monitoring of Wind Turbines

Yayu Peng, Wei Qiao, Liyan Qu

Summary: This article proposes a compressive sensing-based missing-data-tolerant fault detection method for remote condition monitoring of wind turbines. It increases the sparsity of the collected signals, samples them using a compressive-sensing-based algorithm, and reconstructs the signals at the receiving end to detect faults in wind turbines.

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Convolutional Sparse Support Estimator Network (CSEN): From Energy-Efficient Support Estimation to Learning-Aided Compressive Sensing

Mehmet Yamac, Mete Ahishali, Serkan Kiranyaz, Moncef Gabbouj

Summary: This study proposes a novel approach for support estimation of a sparse signal by learning to map non-zero locations from denser measurements. The proposed convolutional sparse support estimator networks (CSENs) are designed to achieve state-of-the-art performance levels with reduced computational complexity.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Low resource end-to-end spoken language understanding with capsule networks

Jakob Poncelet, Vincent Renkens, Hugo Van hamme

Summary: Designing an Spoken Language Understanding (SLU) system for command-and-control applications presents challenges. The proposed end-to-end SLU system utilizes capsule networks for training new commands directly from user demonstrations, showing superior performance to baseline systems and versatility in multitask learning.

COMPUTER SPEECH AND LANGUAGE (2021)

Add to Collection

Article Acoustics

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Wim Boes, Hugo Van Hamme

Summary: This study demonstrates a multi-encoder framework for dealing with the issue of incomplete content in sound recognition. The proposed method successfully incorporates partially available visual information into the operational procedures of the network, leading to improved predictions.

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING (2022)

Add to Collection

Review Engineering, Biomedical

Relating EEG to continuous speech using deep neural networks: a review

Corentin Puffay, Bernd Accou, Lies Bollens, Mohammad Jalilpour Monesi, Jonas Vanthornhout, Hugo Van Hamme, Tom Francart

Summary: This paper reviews deep-learning-based studies that relate EEG to continuous speech, addressing methodological pitfalls and the need for a standard benchmark of model analysis.

JOURNAL OF NEURAL ENGINEERING (2023)

Add to Collection

Article Engineering, Biomedical

Robust neural tracking of linguistic speech representations using a convolutional neural network

Corentin Puffay, Jonas Vanthornhout, Marlies Gillis, Bernd Accou, Hugo Van Hamme, Tom Francart

Summary: This study aims to investigate neural signal tracking in continuous speech and finds that the non-linear CNN model performs better than linear models in measuring the coding of linguistic features.

JOURNAL OF NEURAL ENGINEERING (2023)

Add to Collection

Article Chemistry, Multidisciplinary

Bidirectional Representations for Low-Resource Spoken Language Understanding

Quentin Meeus, Marie-Francine Moens, Hugo Van Hamme

Summary: The featured application models proposed in this research provide a spoken command interface for applications with limited and expensive training data. By introducing a transformer encoder-decoder framework and a multiobjective training strategy, the model is able to learn contextual bidirectional representations, resulting in improved performance. Additionally, the concept of class attention is introduced as an efficient module for spoken language understanding, providing explanations for model predictions and enhancing understanding of decision-making processes.

APPLIED SCIENCES-BASEL (2023)

Add to Collection

Proceedings Paper Acoustics

ANALYSIS OF XLS-R FOR SPEECH QUALITY ASSESSMENT

Bastiaan Tamm, Rik Vandenberghe, Hugo Van Hamme

Summary: This paper focuses on the performance analysis of a pre-trained model in speech quality assessment. The study identifies two optimal regions, lower-level features and high-level features, and explores their differences and potential reasons. Additionally, the paper attempts to fuse the two optimal feature depths and assesses the performance of the proposed models.

2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

LEARNING TO JOINTLY TRANSCRIBE AND SUBTITLE FOR END-TO-END SPONTANEOUS SPEECH RECOGNITION

Jakob Poncelet, Hugo Van Hamme

Summary: TV subtitles provide valuable resources for transcribing various types of speech, but cannot be directly used to improve Automatic Speech Recognition (ASR) models. A multitask dual-decoder Transformer model is proposed to perform ASR and automatic subtitling simultaneously. By training and effectively utilizing subtitle data, improvements are achieved in both regular ASR and spontaneous/conversational ASR.

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

WEAK-SUPERVISED DYSARTHRIA-INVARIANT FEATURES FOR SPOKEN LANGUAGE UNDERSTANDING USING AN FHVAE AND ADVERSARIAL TRAINING

Jinzi Qi, Hugo Van hamme

Summary: This paper focuses on improving the accuracy and generalization ability of spoken language understanding systems for dysarthric speech by introducing adversarial training and weakly supervised feature extraction. The results show that this approach can achieve higher accuracy, especially for speakers with severe dysarthria, in spoken language understanding tasks.

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT (2022)

Add to Collection

Proceedings Paper Acoustics

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications

Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van Hamme

Summary: The quality of speech in online conferencing applications is often evaluated using mean opinion score (MOS) through human judgments. However, this approach is not suitable for large-scale assessments. Researchers have turned to automated MOS prediction using deep neural networks. In this study, a feature extractor based on the XLS-R model is proposed to improve fine-tuning by reducing the number of trainable model parameters. The performance of the pre-trained XLS-R embeddings is compared to MFCC-based feature extractor on the ConferencingSpeech 2022 MOS prediction task, with a reduced root mean squared error (RMSE) observed.

INTERSPEECH 2022 (2022)

Add to Collection

Proceedings Paper Acoustics

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

Pu Wang, Hugo Van Hamme

Summary: In this study, a lean Transformer structure is proposed to build compact spoken language understanding models in low-resource settings. The dimension of attention mechanism is reduced automatically using group sparsity, and the learned attention subspace is transferred to an attention bottleneck layer, achieving competitive accuracies without pre-training.

INTERSPEECH 2022 (2022)

Add to Collection

Proceedings Paper Acoustics

MULTITASK LEARNING FOR LOW RESOURCE SPOKEN LANGUAGE UNDERSTANDING

Quentin Meeus, Marie Francine Moens, Hugo Van Hamme

Summary: In this paper, we investigate the advantages of multitask learning in speech processing by training models on dual objectives, including automatic speech recognition, intent classification, and sentiment classification. Our experiments show that multitask learning can improve model performance, especially in low-resource scenarios. The results demonstrate that multitask learning can effectively compete with and even outperform baseline models in Dutch and English domains.

INTERSPEECH 2022 (2022)

Add to Collection

Proceedings Paper Acoustics

Relating the fundamental frequency of speech with EEG using a dilated convolutional network

Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van Hamme, Tom Francart

Summary: This study investigates how speech is processed in the brain using nonlinear models, focusing on the fundamental frequency (f0) feature and the speech envelope. The results show that combining f0 and the speech envelope improves the performance of the state-of-the-art envelope-based model, and the dilated-convolutional model can generalize well to subjects not included during training.

INTERSPEECH 2022 (2022)

Add to Collection

Proceedings Paper Acoustics

Continual Learning for Monolingual End-to-End Automatic Speech Recognition

Steven Vander Eeckt, Hugo Van Hamme

Summary: This paper discusses the use of Continual Learning (CL) methods to overcome the deterioration in performance of Automatic Speech Recognition (ASR) models when adapted to new domains. The study found that the best performing CL method improved the model's performance by over 40% when extending it to new tasks, using only 0.6% of the original data.

2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022) (2022)

Add to Collection

Proceedings Paper Acoustics

LEARNING SUBJECT-INVARIANT REPRESENTATIONS FROM SPEECH-EVOKED EEG USING VARIATIONAL AUTOENCODERS

Lies Bollens, Tom Francart, Hugo Van Hamme

Summary: The electroencephalogram (EEG) is a powerful method to understand how the brain processes speech. Deep neural networks have replaced linear models and shown promising results in this field. This study utilizes factorized hierarchical variational autoencoders to analyze parallel EEG recordings.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Add to Collection

Proceedings Paper Computer Science, Interdisciplinary Applications

Cross-lingual Detection of Dysphonic Speech for Dutch and Hungarian Datasets

David Sztaho, Miklos Gabriel Tulics, Jinzi Qi, Hugo Van Hamme, Klara Vicsi

Summary: This paper focuses on the extension of detecting dysphonic voices in a cross-lingual scenario. The study uses Hungarian and Dutch speech datasets to perform automatic separation and severity level estimation. The results show that cross-lingual detection is possible with acceptable generalization ability, and features calculated on phoneme-level can improve the results.

BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS (2022)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.