4.6 Article

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

Journal

NEUROCOMPUTING
Volume 218, Issue -, Pages 448-459

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2016.09.018

Keywords

Transfer learning; Speaker adaptation; Deep neural network; Multi-task learning

Ask authors/readers for more resources

In this paper, we present a unified approach to transfer learning of deep neural networks (DNNs) to address performance degradation issues caused by a potential acoustic mismatch between the training and testing conditions due to inter-speaker variability in state-of-the-art connectionist (a.k.a., hybrid) automatic speech recognition (ASR) systems. Different schemes to transfer knowledge of deep neural networks related to speaker adaptation can be developed with ease under such a unifying concept as demonstrated in the three frameworks investigated in this study. In the first solution, knowledge is transferred between homogeneous domains, namely the source and the target domains. Moreover the transfer takes place in a sequential manner from the target to the source speaker to boost the ASR accuracy on spoken utterances from a surprise target speaker. In the second solution, a multi-task approach is adopted to adjust the connectionist parameters to improve the ASR system performance on the target speaker. Knowledge is transferred simultaneously among heterogeneous tasks, and that is achieved by adding one or more smaller auxiliary output layers to the original DNN structure. In the third solution, DNN output classes are organised into a hierarchical structure in order to adjust the connectionist parameters and close the gap between training and testing conditions by transferring prior knowledge from the root node to the leaves in a structural maximum a posteriori fashion. Through a series of experiments on the Wall Street Journal (WSJ) speech recognition task, we show that the proposed solutions result in consistent and statistically significant word error rate reductions. Most importantly, we show that transfer learning is an enabling technology for speaker adaptation, since it outperforms both the transformation-based adaptation algorithms usually adapted in the speech community, and the multi-condition training (MCT) schemes, which is a data combination methods often adopted to cover more acoustic variabilities in speech when data from the source and target domains are both available at the training time. Finally, experimental evidence demonstrates that all proposed solutions are robust to negative transfer even when only a single sentence from the target speaker is available. (C) 2016 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Engineering, Electrical & Electronic

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

Bo Wu, Kehuang Li, Fengpei Ge, Zhen Huang, Minglei Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (2017)

Article Computer Science, Information Systems

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Ju Lin, Wei Li, Yingming Gao, Yanlu Xie, Nancy F. Chen, Sabato Marco Siniscalchi, Jinsong Zhang, Chin-Hui Lee

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY (2018)

Article Acoustics

A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement

Jun Qi, Jun Du, Sabato Marco Siniscalchi, Chin-Hui Lee

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2019)

Article Acoustics

Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models

Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2019)

Article Computer Science, Artificial Intelligence

Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation

Tassadaq Hussain, Sabato Marco Siniscalchi, Hsiao-Lan Sharon Wang, Yu Tsao, Valerio Mario Salerno, Wen-Hung Liao

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS (2020)

Article Computer Science, Artificial Intelligence

A multimodal retina-iris biometric system using the Levenshtein distance for spatial feature comparison

Vincenzo Conti, Leonardo Rundo, Carmelo Militello, Valerio Mario Salerno, Salvatore Vitabile, Sabato Marco Siniscalchi

Summary: The recent developments in information technologies require robust and reliable authentication systems, leading to the proposal of a novel multimodal biometric system based on iris and retina combination. Testing different combinations of biometric databases revealed that the multimodal retina-iris biometric approach outperformed unimodal systems, showing potential as a multimodal authentication framework using multiple static biometric traits.

IET BIOMETRICS (2021)

Article Acoustics

Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition

Ivan Kukanov, Trung Ngo Trong, Ville Hautamaki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2020)

Article Engineering, Electrical & Electronic

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

IEEE SIGNAL PROCESSING LETTERS (2020)

Proceedings Paper Computer Science, Software Engineering

Compressed Multimodal Hierarchical Extreme Learning Machine for Speech Enhancement

Tassadaq Hussain, Yu Tsao, Hsin-Min Wang, Jia-Ching Wang, Sabato Marco Siniscalchi, Wen-Hung Liao

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) (2019)

Proceedings Paper Acoustics

IMPROVING AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE WITH CROSS-MODAL STUDENT-TEACHER TRAINING

Wei Li, Sicheng Wang, Ming Lei, Sabato Marco Siniscalchi, Chin-Hui Lee

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2019)

Proceedings Paper Acoustics

EXPLORING RETRAINING-FREE SPEECH RECOGNITION FOR INTRA-SENTENTIAL CODE-SWITCHING

Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2019)

Proceedings Paper Acoustics

IMPROVING MANDARIN TONE MISPRONUNCIATION DETECTION FOR NON-NATIVE LEARNERS WITH SOFT-TARGET TONE LABELS AND BLSTM-BASED DEEP MODELS

Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2018)

Proceedings Paper Acoustics

A TRANSFER LEARNING AND PROGRESSIVE STACKING APPROACH TO REDUCING DEEP MODEL SIZES WITH AN APPLICATION TO SPEECH ENHANCEMENT

Sicheng Wang, Kehuang Li, Zhen Huang, Sabato Marco Siniscalchi, Chin-Hui Lee

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2017)

Proceedings Paper Engineering, Electrical & Electronic

A UNIFIED DEEP MODELING APPROACH TO SIMULTANEOUS SPEECH DEREVERBERATION AND RECOGNITION FOR THE REVERB CHALLENGE

Bo Wu, Kehuang Li, Zhen Huang, Sabato Marco Siniscalchi, Minglei Yang, Chin-Hui Lee

2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017) (2017)

Article Engineering, Electrical & Electronic

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression

Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

IEEE TRANSACTIONS ON SIGNAL PROCESSING (2020)

Article Computer Science, Artificial Intelligence

3D-KCPNet: Efficient 3DCNNs based on tensor mapping theory

Rui Lv, Dingheng Wang, Jiangbin Zheng, Zhao-Xu Yang

Summary: In this paper, the authors investigate tensor decomposition for neural network compression. They analyze the convergence and precision of tensor mapping theory, validate the rationality of tensor mapping and its superiority over traditional tensor approximation based on the Lottery Ticket Hypothesis. They propose an efficient method called 3D-KCPNet to compress 3D convolutional neural networks using the Kronecker canonical polyadic (KCP) tensor decomposition. Experimental results show that 3D-KCPNet achieves higher accuracy compared to the original baseline model and the corresponding tensor approximation model.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Personalized robotic control via constrained multi-objective reinforcement learning

Xiangkun He, Zhongxu Hu, Haohan Yang, Chen Lv

Summary: In this paper, a novel constrained multi-objective reinforcement learning algorithm is proposed for personalized end-to-end robotic control with continuous actions. The approach trains a single model using constraint design and a comprehensive index to achieve optimal policies based on user-specified preferences.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Overlapping community detection using expansion with contraction

Zhijian Zhuo, Bilian Chen, Shenbao Yu, Langcai Cao

Summary: In this paper, a novel method called Expansion with Contraction Method for Overlapping Community Detection (ECOCD) is proposed, which utilizes non-negative matrix factorization to obtain disjoint communities and applies expansion and contraction processes to adjust the degree of overlap. ECOCD is applicable to various networks with different properties and achieves high-quality overlapping community detection.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

High-compressed deepfake video detection with contrastive spatiotemporal distillation

Yizhe Zhu, Chunhui Zhang, Jialin Gao, Xin Sun, Zihan Rui, Xi Zhou

Summary: In this work, the authors propose a Contrastive Spatio-Temporal Distilling (CSTD) approach to improve the detection of high-compressed deepfake videos. The approach leverages spatial-frequency cues and temporal-contrastive alignment to fully exploit spatiotemporal inconsistency information.

NEUROCOMPUTING (2024)

Review Computer Science, Artificial Intelligence

A review of coverless steganography

Laijin Meng, Xinghao Jiang, Tanfeng Sun

Summary: This paper provides a review of coverless steganographic algorithms, including the development process, known contributions, and general issues in image and video algorithms. It also discusses the security of coverless steganography from theoretical analysis to actual investigation for the first time.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Confidence-based interactable neural-symbolic visual question answering

Yajie Bao, Tianwei Xing, Xun Chen

Summary: Visual question answering requires processing multi-modal information and effective reasoning. Neural-symbolic learning is a promising method, but current approaches lack uncertainty handling and can only provide a single answer. To address this, we propose a confidence based neural-symbolic approach that evaluates NN inferences and conducts reasoning based on confidence.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

A framework-based transformer and knowledge distillation for interior style classification

Anh H. Vo, Bao T. Nguyen

Summary: Interior style classification is an interesting problem with potential applications in both commercial and academic domains. This project proposes a method named ISC-DeIT, which combines data-efficient image transformer architectures and knowledge distillation, to address the interior style classification problem. Experimental results demonstrate a significant improvement in predictive accuracy compared to other state-of-the-art methods.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Improving robustness for vision transformer with a simple dynamic scanning augmentation

Shashank Kotyan, Danilo Vasconcellos Vargas

Summary: This article introduces a novel augmentation technique called Dynamic Scanning Augmentation to improve the accuracy and robustness of Vision Transformer (ViT). The technique leverages dynamic input sequences to adaptively focus on different patches, resulting in significant changes in ViT's attention mechanism. Experimental results demonstrate that Dynamic Scanning Augmentation outperforms ViT in terms of both robustness to adversarial attacks and accuracy against natural images.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Introducing shape priors in Siamese networks for image classification

Hiba Alqasir, Damien Muselet, Christophe Ducottet

Summary: The article proposes a solution to improve the learning process of a classification network by providing shape priors, reducing the need for annotated data. The solution is tested on cross-domain digit classification tasks and a video surveillance application.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Neural dynamics solver for time-dependent infinity-norm optimization based on ACP framework with robot application

Dexiu Ma, Mei Liu, Mingsheng Shang

Summary: This paper proposes a method using neural dynamics solvers to solve infinity-norm optimization problems. Two improved solvers are constructed and their effectiveness and superiority are demonstrated through theoretical analysis and simulation experiments.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

cpp-AIF: A multi-core C plus plus implementation of Active Inference for Partially Observable Markov Decision Processes

Francesco Gregoretti, Giovanni Pezzulo, Domenico Maisto

Summary: Active Inference is a computational framework that uses probabilistic inference and variational free energy minimization to describe perception, planning, and action. cpp-AIF is a header-only C++ library that provides a powerful tool for implementing Active Inference for Partially Observable Markov Decision Processes through multi-core computing. It is cross-platform and improves performance, memory management, and usability compared to existing software.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Predicting stock market trends with self-supervised learning

Zelin Ying, Dawei Cheng, Cen Chen, Xiang Li, Peng Zhu, Yifeng Luo, Yuqi Liang

Summary: This paper proposes a novel stock market trends prediction framework called SMART, which includes a self-supervised stock technical data sequence embedding model S3E. By training with multiple self-supervised auxiliary tasks, the model encodes stock technical data sequences into embeddings and uses the learned sequence embeddings for predicting stock market trends. Extensive experiments on China A-Shares market and NASDAQ market prove the high effectiveness of our model in stock market trends prediction, and its effectiveness is further validated in real-world applications in a leading financial service provider in China.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

DHGAT: Hyperbolic representation learning on dynamic graphs via attention networks

Hao Li, Hao Jiang, Dongsheng Ye, Qiang Wang, Liang Du, Yuanyuan Zeng, Liu Yuan, Yingxue Wang, C. Chen

Summary: DHGAT1, a dynamic hyperbolic graph attention network, utilizes hyperbolic metric properties to embed dynamic graphs. It employs a spatiotemporal self-attention mechanism and weighted node representations, resulting in excellent performance in link prediction tasks.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Progressive network based on detail scaling and texture extraction: A more general framework for image deraining

Jiehui Huang, Zhenchao Tang, Xuedong He, Jun Zhou, Defeng Zhou, Calvin Yu-Chian Chen

Summary: This study proposes a progressive learning multi-scale feature blending model for image deraining tasks. The model utilizes detail dilation and texture extraction to improve the restoration of rainy images. Experimental results show that the model achieves near state-of-the-art performance in rain removal tasks and exhibits better rain removal realism.

NEUROCOMPUTING (2024)

Article Computer Science, Artificial Intelligence

Stabilization and synchronization control for discrete-time complex networks via the auxiliary role of edges subsystem

Lizhi Liu, Zilin Gao, Yinhe Wang, Yongfu Li

Summary: This paper proposes a novel discrete-time interconnected model for depicting complex dynamical networks. The model consists of nodes and edges subsystems, which consider the dynamic characteristic of both nodes and edges. By designing control strategies and coupling modes, the stabilization and synchronization of the network are achieved. Simulation results demonstrate the effectiveness of the proposed methods.

NEUROCOMPUTING (2024)