Article
Engineering, Electrical & Electronic
Reinhold Haeb-Umbach, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc Delcroix, Tomohiro Nakatani
Summary: Far-field automatic speech recognition (ASR) has gained significant attention and application in science and industry, with consumer market adoption for digital home assistants. Signal enhancement and robust ASR engine are key to improving recognition accuracy, with a combination of deep learning and traditional signal processing proving to be an effective solution.
PROCEEDINGS OF THE IEEE
(2021)
Review
Computer Science, Information Systems
Sadeen Alharbi, Muna Alrazgan, Alanoud Alrashed, Turkiayh Alnomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, Maha Almojil
Summary: Recent years have witnessed a significant amount of research in the field of speech signal processing, with a particular focus on automatic speech recognition (ASR) technology. This systematic review aims to summarize the most significant topics published in the past six years and identify major ASR challenges in real-world environments. Additionally, the review discusses current research gaps in ASR and suggests new research directions.
Article
Computer Science, Information Systems
Mishaim Malik, Muhammad Kamran Malik, Khawar Mehmood, Imran Makhdoom
Summary: This study provides a thorough comparison of cutting-edge techniques in automatic speech recognition (ASR), focusing on various deep learning methods and their impact on ASR performance. It also delves into different speech datasets and online toolkits that can aid in ASR research, serving as a valuable starting point for academics interested in this area.
MULTIMEDIA TOOLS AND APPLICATIONS
(2021)
Review
Computer Science, Interdisciplinary Applications
Jaspreet Kaur, Amitoj Singh, Virender Kadyan
Summary: This paper conducts a systematic survey on Automatic Speech Recognition (ASR) for tonal languages spoken globally, focusing mainly on Asian, Indo-European, and African tonal languages. It is found that while there has been extensive research on Asian tonal languages, there is limited research reported on Indo-European and African tonal languages.
ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING
(2021)
Article
Engineering, Electrical & Electronic
Fahad Shamshad, Ali Ahmed
Summary: This article introduces an alternative method for recovering the phase of optical images using algorithmic phase retrieval, showing effectiveness in optical imaging setup and improvement over conventional hand-engineered priors. By modifying the algorithm to explore solutions outside the range of the generative model, the proposed approach achieves improved performance. The effectiveness of the proposed approach is verified through a real application of multiple scattering media imaging.
IEEE SENSORS JOURNAL
(2021)
Review
Acoustics
Hanan Aldarmaki, Asad Ullah, Sreepratha Ram, Nazar Zaki
Summary: This paper reviews the research literature to examine the challenges and potential solutions for achieving fully unsupervised ASR, with the aim of optimizing ASR development for low-resource languages.
SPEECH COMMUNICATION
(2022)
Article
Computer Science, Artificial Intelligence
Ahmad Almadhor, Rizwana Irfan, Jiechao Gao, Nasir Saleem, Hafiz Tayyab Rauf, Seifedine Kadry
Summary: Dysarthria is a speech disability caused by weak muscles and organs involved in articulation, affecting speech intelligibility. This paper proposes a visual dysarthric ASR system using SCNN and MHAT to overcome speech challenges. The DASR system outperformed other systems, improving recognition accuracy for the UA-Speech database by 20.72%, with the largest improvements seen in very-low (25.75%) and low intelligibility (33.67%) cases.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Piotr Zelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
Summary: This paper investigates how to build the phone inventory of an unseen language in an unsupervised way and analyzes the influences of different factors. It also discovers some universal phone tokens and identifies the challenges in phonetic inventory discovery.
COMPUTER SPEECH AND LANGUAGE
(2022)
Article
Engineering, Electrical & Electronic
Honglin Qu, Xiangdong Su, Yonghe Wang, Xiang Hao, Guanglai Gao
Summary: This letter presents an improvement on feature-based knowledge distillation for robust speech recognition. By using distillation techniques, the robustness of the system has been enhanced. The proposed method includes an adaptive distillation position selection strategy and a noise separation mechanism, assuming a common network structure between the student and teacher. Two improvements are introduced in this method: adaptive selection of distillation positions based on loss values and explicit separation of speech and noise information using a noise separation module. The proposed method outperforms the standardized feature-based knowledge distillation method in terms of recognition performance.
IEEE SIGNAL PROCESSING LETTERS
(2023)
Article
Chemistry, Multidisciplinary
Vasundhara Shukla, Preety D. D. Swami
Summary: This paper presents a novel speech enhancement approach called DCGOSM in compressive sensing (CS) using particle swarm optimization (PSO) to optimize the sensing matrix for separate basis vectors of speech and noise signals. It achieves lower noise in the reconstructed signal by avoiding noise components through orthogonal matching pursuit (OMP)-based CS signal reconstruction with the optimized matrix. DCGOSM outperforms other OMP-based CS algorithms and DNN-based speech enhancement techniques, demonstrating significant improvements in SNR, SSNR, PESQ, and STOI, as well as reducing recovery time.
APPLIED SCIENCES-BASEL
(2023)
Article
Computer Science, Information Systems
Amrit Preet Kaur, Amitoj Singh, Rohit Sachdeva, Vinay Kukreja
Summary: This study provides a detailed assessment of voice recognition strategies for different languages, highlighting the lack of standard speech corpus for minority languages. It also explores hybrid acoustic modeling methods to improve the efficiency of voice recognition.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
G. Thimmaraja Yadava, B. G. Nagaraja, H. S. Jayanna
Summary: This work demonstrates enhancements made to a previous continuous Kannada automatic speech recognition (ASR) spoken query system (SQS) in a real-time environment. The proposed background noise suppression block improves the audibility and intelligibility of speech sound compared to conventional methods. By combining noise reduction and time delay neural network (TDNN) acoustic modeling techniques, there is a 1.87% improvement in word error rate (WER) compared to the previous SQS. The online testing statistics of the newly developed continuous Kannada ASR system are also presented in this study.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Acoustics
Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai
Summary: This paper proposes a novel self-supervised pre-training framework that incorporates speech enhancement to improve automatic speech recognition (ASR) performance in noisy environments. In the pre-training phase, the original noisy waveform or the waveform obtained by speech enhancement is fed into the self-supervised model to learn the contextual representation, with quantized clean speech as the target. Additionally, a dual-attention fusion method is proposed to combine the features of noisy and enhanced speech, compensating for information loss from individual modules.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Article
Automation & Control Systems
Yayu Peng, Wei Qiao, Liyan Qu
Summary: This article proposes a compressive sensing-based missing-data-tolerant fault detection method for remote condition monitoring of wind turbines. It increases the sparsity of the collected signals, samples them using a compressive-sensing-based algorithm, and reconstructs the signals at the receiving end to detect faults in wind turbines.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
(2022)
Article
Computer Science, Artificial Intelligence
Mehmet Yamac, Mete Ahishali, Serkan Kiranyaz, Moncef Gabbouj
Summary: This study proposes a novel approach for support estimation of a sparse signal by learning to map non-zero locations from denser measurements. The proposed convolutional sparse support estimator networks (CSENs) are designed to achieve state-of-the-art performance levels with reduced computational complexity.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Jakob Poncelet, Vincent Renkens, Hugo Van hamme
Summary: Designing an Spoken Language Understanding (SLU) system for command-and-control applications presents challenges. The proposed end-to-end SLU system utilizes capsule networks for training new commands directly from user demonstrations, showing superior performance to baseline systems and versatility in multitask learning.
COMPUTER SPEECH AND LANGUAGE
(2021)
Article
Acoustics
Wim Boes, Hugo Van Hamme
Summary: This study demonstrates a multi-encoder framework for dealing with the issue of incomplete content in sound recognition. The proposed method successfully incorporates partially available visual information into the operational procedures of the network, leading to improved predictions.
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
(2022)
Review
Engineering, Biomedical
Corentin Puffay, Bernd Accou, Lies Bollens, Mohammad Jalilpour Monesi, Jonas Vanthornhout, Hugo Van Hamme, Tom Francart
Summary: This paper reviews deep-learning-based studies that relate EEG to continuous speech, addressing methodological pitfalls and the need for a standard benchmark of model analysis.
JOURNAL OF NEURAL ENGINEERING
(2023)
Article
Engineering, Biomedical
Corentin Puffay, Jonas Vanthornhout, Marlies Gillis, Bernd Accou, Hugo Van Hamme, Tom Francart
Summary: This study aims to investigate neural signal tracking in continuous speech and finds that the non-linear CNN model performs better than linear models in measuring the coding of linguistic features.
JOURNAL OF NEURAL ENGINEERING
(2023)
Article
Chemistry, Multidisciplinary
Quentin Meeus, Marie-Francine Moens, Hugo Van Hamme
Summary: The featured application models proposed in this research provide a spoken command interface for applications with limited and expensive training data. By introducing a transformer encoder-decoder framework and a multiobjective training strategy, the model is able to learn contextual bidirectional representations, resulting in improved performance. Additionally, the concept of class attention is introduced as an efficient module for spoken language understanding, providing explanations for model predictions and enhancing understanding of decision-making processes.
APPLIED SCIENCES-BASEL
(2023)
Proceedings Paper
Acoustics
Bastiaan Tamm, Rik Vandenberghe, Hugo Van Hamme
Summary: This paper focuses on the performance analysis of a pre-trained model in speech quality assessment. The study identifies two optimal regions, lower-level features and high-level features, and explores their differences and potential reasons. Additionally, the paper attempts to fuse the two optimal feature depths and assesses the performance of the proposed models.
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Jakob Poncelet, Hugo Van Hamme
Summary: TV subtitles provide valuable resources for transcribing various types of speech, but cannot be directly used to improve Automatic Speech Recognition (ASR) models. A multitask dual-decoder Transformer model is proposed to perform ASR and automatic subtitling simultaneously. By training and effectively utilizing subtitle data, improvements are achieved in both regular ASR and spontaneous/conversational ASR.
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Jinzi Qi, Hugo Van hamme
Summary: This paper focuses on improving the accuracy and generalization ability of spoken language understanding systems for dysarthric speech by introducing adversarial training and weakly supervised feature extraction. The results show that this approach can achieve higher accuracy, especially for speakers with severe dysarthria, in spoken language understanding tasks.
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT
(2022)
Proceedings Paper
Acoustics
Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van Hamme
Summary: The quality of speech in online conferencing applications is often evaluated using mean opinion score (MOS) through human judgments. However, this approach is not suitable for large-scale assessments. Researchers have turned to automated MOS prediction using deep neural networks. In this study, a feature extractor based on the XLS-R model is proposed to improve fine-tuning by reducing the number of trainable model parameters. The performance of the pre-trained XLS-R embeddings is compared to MFCC-based feature extractor on the ConferencingSpeech 2022 MOS prediction task, with a reduced root mean squared error (RMSE) observed.
Proceedings Paper
Acoustics
Pu Wang, Hugo Van Hamme
Summary: In this study, a lean Transformer structure is proposed to build compact spoken language understanding models in low-resource settings. The dimension of attention mechanism is reduced automatically using group sparsity, and the learned attention subspace is transferred to an attention bottleneck layer, achieving competitive accuracies without pre-training.
Proceedings Paper
Acoustics
Quentin Meeus, Marie Francine Moens, Hugo Van Hamme
Summary: In this paper, we investigate the advantages of multitask learning in speech processing by training models on dual objectives, including automatic speech recognition, intent classification, and sentiment classification. Our experiments show that multitask learning can improve model performance, especially in low-resource scenarios. The results demonstrate that multitask learning can effectively compete with and even outperform baseline models in Dutch and English domains.
Proceedings Paper
Acoustics
Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van Hamme, Tom Francart
Summary: This study investigates how speech is processed in the brain using nonlinear models, focusing on the fundamental frequency (f0) feature and the speech envelope. The results show that combining f0 and the speech envelope improves the performance of the state-of-the-art envelope-based model, and the dilated-convolutional model can generalize well to subjects not included during training.
Proceedings Paper
Acoustics
Steven Vander Eeckt, Hugo Van Hamme
Summary: This paper discusses the use of Continual Learning (CL) methods to overcome the deterioration in performance of Automatic Speech Recognition (ASR) models when adapted to new domains. The study found that the best performing CL method improved the model's performance by over 40% when extending it to new tasks, using only 0.6% of the original data.
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022)
(2022)
Proceedings Paper
Acoustics
Lies Bollens, Tom Francart, Hugo Van Hamme
Summary: The electroencephalogram (EEG) is a powerful method to understand how the brain processes speech. Deep neural networks have replaced linear models and shown promising results in this field. This study utilizes factorized hierarchical variational autoencoders to analyze parallel EEG recordings.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
(2022)
Proceedings Paper
Computer Science, Interdisciplinary Applications
David Sztaho, Miklos Gabriel Tulics, Jinzi Qi, Hugo Van Hamme, Klara Vicsi
Summary: This paper focuses on the extension of detecting dysphonic voices in a cross-lingual scenario. The study uses Hungarian and Dutch speech datasets to perform automatic separation and severity level estimation. The results show that cross-lingual detection is possible with acceptable generalization ability, and features calculated on phoneme-level can improve the results.
BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS
(2022)