Article
Computer Science, Information Systems
Lichen Wang, Zhengming Ding, Kasey Lee, Seungju Han, Jae-Joon Han, Changkyu Choi, Yun Fu
Summary: Multi-label learning is a method to deal with the problem of having multiple labels for a single instance in real-world applications. However, it is more challenging due to complex label correlation, long-tail label distribution, and data shortage. To overcome these limitations, we propose a general and compact Multi-Label Correlation Learning (MUCO) framework, which explicitly and effectively learns the latent label correlations by updating a label correlation tensor and handles the long-tail label distribution challenge through a multilabel generative strategy.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
(2023)
Article
Computer Science, Artificial Intelligence
Reem Alotaibi, Peter Flach
Summary: This paper investigates cost-sensitive classification methods for multi-label classification, adopting a simple but general thresholding method that is applicable to most classification algorithms. It explores the choice of single and multiple thresholds and proposes cost curves and scatter diagrams for performance evaluation. Experimental evaluation on 13 multi-label datasets demonstrates that adjusting a global threshold instead of per-label threshold does not lead to significant performance loss.
Article
Computer Science, Artificial Intelligence
Lichen Wang, Yunyu Liu, Hang Di, Can Qin, Gan Sun, Yun Fu
Summary: In the context of multi-label classification, a Semi-supervised Dual Relation Learning (SDRL) framework is proposed that effectively explores the feature-label relation and label-label relation knowledge using a combination of labeled and unlabeled samples. The framework outperforms other state-of-the-art baselines in both general and zero-shot multi-label classification tasks. Extensive ablation studies demonstrate the effectiveness of each component in the SDRL framework.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Information Systems
Lichen Wang, Zhengming Ding, Yun Fu
Summary: The AGMA method is a generic multi-label learning framework based on adaptive graph and marginalized augmentation, which improves learning performance by combining a small amount of labeled data with a large amount of unlabeled data. This method utilizes adaptive similarity graphs, marginalized augmentation strategies, and feature-label autoencoders to enhance the model's generalization capability and efficiency.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
(2022)
Article
Biology
Xin Wang, Jun Wang, Fei Shan, Yiqiang Zhan, Jun Shi, Dinggang Shen
Summary: In this study, label distribution learning (LDL) and a cost-sensitive mechanism were used to predict the severity of pulmonary diseases. By generating a label distribution for each patient, the CT images were able to learn not only the information of the current day, but also the information of the neighboring days. The results showed that the proposed method achieved superior performance in predicting the severity of pulmonary diseases.
COMPUTERS IN BIOLOGY AND MEDICINE
(2023)
Article
Computer Science, Information Systems
Yi-Hsun Lin, Homer H. Chen
Summary: This study introduces a cost-sensitive tag propagation learning method that successfully improves the performance of three auto-tagging models. The cost-sensitive loss function helps reduce the impact of missing tags, and the artist music context is found to be more effective for tag propagation than other music contexts.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Computer Science, Artificial Intelligence
Yong Dai, Weiwei Song, Yi Li, Luigi Di Stefano
Summary: This paper introduces a Feature Disentangling and Reciprocal Learning (FDRL) method with label-guided similarity to solve the multi-label image retrieval problem. It enhances the feature representation ability through feature extraction, disentanglement, and reciprocal learning, and optimizes the whole network using a label-guided similarity loss function. Experimental results show that the proposed method outperforms current state-of-the-art techniques.
Article
Computer Science, Information Systems
Ming Wu, Qianmu Li, Muhammad Bilal, Xiaolong Xu, Jing Zhang, Jun Hou
Summary: With the rise of IIoT, Artificial Intelligence is utilized in various research areas, and multi-label active learning has become popular. By utilizing crowdsourcing, a more economical and efficient strategy, for multi-label active learning in IIoT, the proposed MALC method outperforms existing techniques.
Article
Computer Science, Artificial Intelligence
Xiaoqiang Gui, Xudong Lu, Guoxian Yu
Summary: Active learning aims to select valuable unlabeled instances to improve learner performance. Batch-mode active learning methods are more efficient than myopic methods and can reduce query and time costs.
Article
Computer Science, Artificial Intelligence
Jiwen Lu, Venice Erin Liong, Yap-Peng Tan
Summary: This paper proposes an adversarial multi-label variational hashing method to learn compact binary codes for efficient image retrieval. The method learns hash functions from both synthetic and real data, making it effective for unseen data. By simultaneously enforcing adversarial learning, discriminative binary codes learning, and generating synthetic training samples, the method demonstrates efficacy on benchmark datasets.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Artificial Intelligence
Ge Song, Xiaoyang Tan, Jun Zhao, Ming Yang
Summary: RMSH is designed for more accurate multi-label cross-modal retrieval, addressing modality discrepancies and noise through fine-grained similarity of rich semantics and robust margin-adaptive triplet loss. The effective bounds derived from information coding-theoretic analysis enable our method to achieve state-of-the-art performance on multiple benchmarks.
PATTERN RECOGNITION
(2021)
Article
Chemistry, Analytical
Yang Liu, Qince Li, Kuanquan Wang, Jun Liu, Runnan He, Yongfeng Yuan, Henggui Zhang
Summary: This study proposes a novel deep learning model-based learning framework and thresholding method for designing multi-label ECG classifiers, and evaluates the method on multiple realistic datasets with a cost-sensitive metric, showing superior performance in cost sensitivity.
Article
Computer Science, Information Systems
Bo Liu, Weibin Li, Yanshan Xiao, Xiaodong Chen, Laiwang Liu, Changdong Liu, Kai Wang, Peng Sun
Summary: Multi-label learning is a popular topic in machine learning that deals with the simultaneous association of multiple labels with given samples. This paper proposes a new multi-view multi-label learning method called ELSMML, which considers label correlation. The method constructs a crafted label correlation matrix to describe label relationships and utilizes multi-view learning and dimension reduction to exploit latent semantic label information and feature information, building a classifier in a low dimensional space. The ELSMML model is optimized using the accelerated proximal gradient method and achieves better performance compared to other baselines according to evaluation metrics.
INFORMATION SCIENCES
(2023)
Article
Engineering, Electrical & Electronic
Chunpu Sun, Huaxiang Zhang, Li Liu, Dongmei Liu, Lin Wang
Summary: In this paper, a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT) method is proposed to bridge the semantic gap and eliminate modal specific features. The method constructs a semantic consistency enhanced module and a multi-stage adversarial learning module to optimize feature representations.
SIGNAL PROCESSING-IMAGE COMMUNICATION
(2023)
Article
Engineering, Electrical & Electronic
Mengge He, Wenjing Du, Zhiquan Wen, Qing Du, Yutong Xie, Qi Wu
Summary: In this paper, a Multi-Granularity Aggregation Transformer (MGAT) is proposed for joint video-audio-text representation learning. The method overcomes the limitations of existing methods by designing a multi-granularity transformer module and an attention-guided aggregation module. The aggregated information is aligned with text information at different hierarchical levels using consistency loss and contrastive loss. Experimental results demonstrate the superiority of the proposed method on tasks such as video-paragraph retrieval and video captioning.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Engineering, Electrical & Electronic
Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
Summary: This paper introduces SVSNet, the first end-to-end neural network model for assessing speaker voice similarity in voice conversion tasks. Unlike most neural evaluation metrics, SVSNet takes raw waveform as input to make full use of speech information. Experimental results on VCC2018 and VCC2020 datasets show that SVSNet outperforms baseline systems in assessing speaker similarity at both utterance and system levels.
IEEE SIGNAL PROCESSING LETTERS
(2022)
Article
Acoustics
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang
Summary: In this paper, two novel approaches are proposed to improve the generalization ability of speaker verification and reduce interference from other speakers. Experimental results show that these methods can significantly enhance system performance.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Article
Acoustics
Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Summary: This study proposes MOSA-Net, a cross-domain multi-objective speech assessment model that can estimate speech quality, intelligibility, and distortion assessment scores simultaneously. Experimental results show that MOSA-Net improves the prediction of speech quality and short-time objective intelligibility compared to existing single-task models. Moreover, MOSA-Net can be effectively adapted to predict subjective quality and intelligibility scores with limited training data. The proposed QIA-SE approach, guided by MOSA-Net's latent representations, also outperforms the baseline SE system in terms of PESQ scores.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Article
Engineering, Electrical & Electronic
Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Summary: This study proposes a neural architecture that extracts speaker representations and detects the presence of each speaker on a frame-by-frame basis, regardless of the number of speakers in a conversation. The model outperforms previous methods in tests on the CALLHOME corpus and achieves significant diarization error rate reductions in a more challenging case with simultaneous speakers ranging from 2 to 7.
IEEE SIGNAL PROCESSING LETTERS
(2023)
Article
Acoustics
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang
Summary: In this paper, a novel architecture based on self-constraint learning (SCL) and reconstruction task (RT) is proposed to remove the influence of phonetic information on speaker embedding generation. Experimental results show that the proposed DROP-TDNN system outperforms the state-of-the-art ECAPA-TDNN system on multiple datasets.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang
Summary: A noise-aware training framework based on two cascaded neural structures is proposed in this paper to jointly optimize speech enhancement and speech recognition, achieving a lower word error rate (WER).
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)
(2022)
Proceedings Paper
Acoustics
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Summary: Because of the excellent performance of speech separation in cases of complete speaker overlap, the focus of research has shifted towards dealing with more realistic scenarios. However, domain mismatch between training and testing situations remains a significant problem due to various factors. This study investigates the impacts of language and channel mismatches on speech separation and proposes a new solution for channel mismatch using projection evaluation.
Proceedings Paper
Acoustics
Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Summary: This study proposes a multi-branched speech intelligibility prediction model (MBI-Net) to predict the subjective intelligibility scores of hearing aid users. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system.
Proceedings Paper
Acoustics
Ryandhimas Edo Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Summary: This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human subjective listening test results and word error rate (WER) scores. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings.
Proceedings Paper
Acoustics
Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
Summary: The VoiceMOS Challenge aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. Through this challenge, 22 participating teams from academia and industry tested various approaches to predict human ratings of synthesized speech. The results highlight the effectiveness of fine-tuning self-supervised speech models for MOS prediction, as well as the challenges in predicting MOS ratings for unseen speakers, listeners, and systems in the out-of-domain setting.
Proceedings Paper
Acoustics
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao
Summary: In this paper, a method called NASTAR is proposed, which addresses the training-test acoustic mismatch issue in deep learning-based speech enhancement systems by using only one sample of noisy speech in the target environment. NASTAR utilizes a feedback mechanism to simulate adaptive training data and experimental results show its effectiveness in noise adaptation.
Proceedings Paper
Acoustics
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang
Summary: In this paper, we propose three new versions of a discriminative autoencoder (DcAE) for speech recognition, achieving superior experimental results.
Proceedings Paper
Computer Science, Artificial Intelligence
Sahibzada Adil Shahzad, Ammarah Hashmi, Sarwar Khan, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang
Summary: Deepfake technology has both positive and negative impacts on society. While there have been efforts to detect fake footage using unimodal deep learning models, this approach is insufficient for detecting multimodal manipulations. This study proposes a lip-reading-based multimodal Deepfake detection method called Lip Sync Matters, which shows superior performance in detecting forged videos.
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Ammarah Hashmi, Sahibzada Adil Shahzad, Wasim Ahmad, Chia Wen Lin, Yu Tsao, Hsin-Min Wang
Summary: This paper proposes a deep forgery detection method based on audiovisual ensemble learning for the task of multimodal forgery detection, achieving a high accuracy rate of 89% in experimental results.
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Shih-Kuang Lee, Yu Tsao, Hsin-Min Wang
Summary: This paper proposes a new feature for replay detection, which utilizes the temporal auto-correlation of single-channel speech. The experimental results demonstrate that the proposed feature can effectively distinguish replay attacks, clean speech, and speech with simulated reverberation, and its utilization in a fusion system consistently improves performance. Moreover, the best fusion system achieves a zero equal error rate and a zero minimum tandem detection cost function for the first time on the development set.
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)
(2022)