☆ 4.7 Article

Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval

IEEE TRANSACTIONS ON MULTIMEDIA (2011)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 13, 期 3, 页码 518-529

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2011.2129498

关键词

Audio tag annotation; audio tag retrieval; cost-sensitive learning; multi-label; tag count

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Science Council of Taiwan [NSC99-2631-H-001-020]

向作者/读者索取更多资源

Protocol

Reagent

摘要

Audio tags correspond to keywords that people use to describe different aspects of a music clip. With the explosive growth of digital music available on the Web, automatic audio tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. This can be achieved by training a binary classifier for each tag based on the labeled music data. Our method that won the MIREX 2009 audio tagging competition is one of this kind of methods. However, since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging problem as a cost-sensitive classification problem. In addition, tag correlation information is useful for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging problem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem and propose two solutions to solve it. The experimental results demonstrate that the new approach outperforms our MIREX 2009 winning method.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Information Systems

Generative Multi-Label Correlation Learning

Lichen Wang, Zhengming Ding, Kasey Lee, Seungju Han, Jae-Joon Han, Changkyu Choi, Yun Fu

Summary: Multi-label learning is a method to deal with the problem of having multiple labels for a single instance in real-world applications. However, it is more challenging due to complex label correlation, long-tail label distribution, and data shortage. To overcome these limitations, we propose a general and compact Multi-Label Correlation Learning (MUCO) framework, which explicitly and effectively learns the latent label correlations by updating a label correlation tensor and handles the long-tail label distribution challenge through a multilabel generative strategy.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Multi-label thresholding for cost-sensitive classification

Reem Alotaibi, Peter Flach

Summary: This paper investigates cost-sensitive classification methods for multi-label classification, adopting a simple but general thresholding method that is applicable to most classification algorithms. It explores the choice of single and multiple thresholds and proposes cost curves and scatter diagrams for performance evaluation. Experimental evaluation on 13 multi-label datasets demonstrates that adjusting a global threshold instead of per-label threshold does not lead to significant performance loss.

NEUROCOMPUTING (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Semi-Supervised Dual Relation Learning for Multi-Label Classification

Lichen Wang, Yunyu Liu, Hang Di, Can Qin, Gan Sun, Yun Fu

Summary: In the context of multi-label classification, a Semi-supervised Dual Relation Learning (SDRL) framework is proposed that effectively explores the feature-label relation and label-label relation knowledge using a combination of labeled and unlabeled samples. The framework outperforms other state-of-the-art baselines in both general and zero-shot multi-label classification tasks. Extensive ablation studies demonstrate the effectiveness of each component in the SDRL framework.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

添加到收藏夹

Article Computer Science, Information Systems

Generic Multi-label Annotation via Adaptive Graph and Marginalized Augmentation

Lichen Wang, Zhengming Ding, Yun Fu

Summary: The AGMA method is a generic multi-label learning framework based on adaptive graph and marginalized augmentation, which improves learning performance by combining a small amount of labeled data with a large amount of unlabeled data. This method utilizes adaptive similarity graphs, marginalized augmentation strategies, and feature-label autoencoders to enhance the model's generalization capability and efficiency.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2022)

添加到收藏夹

Article Biology

Severity prediction of pulmonary diseases using chest CT scans via cost-sensitive label multi-kernel distribution learning

Xin Wang, Jun Wang, Fei Shan, Yiqiang Zhan, Jun Shi, Dinggang Shen

Summary: In this study, label distribution learning (LDL) and a cost-sensitive mechanism were used to predict the severity of pulmonary diseases. By generating a label distribution for each patient, the CT images were able to learn not only the information of the current day, but also the information of the neighboring days. The results showed that the proposed method achieved superior performance in predicting the severity of pulmonary diseases.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

添加到收藏夹

Article Computer Science, Information Systems

Tag Propagation and Cost-Sensitive Learning for Music Auto-Tagging

Yi-Hsun Lin, Homer H. Chen

Summary: This study introduces a cost-sensitive tag propagation learning method that successfully improves the performance of three auto-tagging models. The cost-sensitive loss function helps reduce the impact of missing tags, and the artist music context is found to be more effective for tag propagation than other music contexts.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Feature disentangling and reciprocal learning with label-guided similarity for multi-label image retrieval

Yong Dai, Weiwei Song, Yi Li, Luigi Di Stefano

Summary: This paper introduces a Feature Disentangling and Reciprocal Learning (FDRL) method with label-guided similarity to solve the multi-label image retrieval problem. It enhances the feature representation ability through feature extraction, disentanglement, and reciprocal learning, and optimizes the whole network using a label-guided similarity loss function. Experimental results show that the proposed method outperforms current state-of-the-art techniques.

NEUROCOMPUTING (2022)

添加到收藏夹

Article Computer Science, Information Systems

Multi-label active learning from crowds for secure IIoT

Ming Wu, Qianmu Li, Muhammad Bilal, Xiaolong Xu, Jing Zhang, Jun Hou

Summary: With the rise of IIoT, Artificial Intelligence is utilized in various research areas, and multi-label active learning has become popular. By utilizing crowdsourcing, a more economical and efficient strategy, for multi-label active learning in IIoT, the proposed MALC method outperforms existing techniques.

AD HOC NETWORKS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Cost-effective Batch-mode Multi-label Active Learning

Xiaoqiang Gui, Xudong Lu, Guoxian Yu

Summary: Active learning aims to select valuable unlabeled instances to improve learner performance. Batch-mode active learning methods are more efficient than myopic methods and can reduce query and time costs.

NEUROCOMPUTING (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Adversarial Multi-Label Variational Hashing

Jiwen Lu, Venice Erin Liong, Yap-Peng Tan

Summary: This paper proposes an adversarial multi-label variational hashing method to learn compact binary codes for efficient image retrieval. The method learns hash functions from both synthetic and real data, making it effective for unseen data. By simultaneously enforcing adversarial learning, discriminative binary codes learning, and generating synthetic training samples, the method demonstrates efficacy on benchmark datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Deep robust multilevel semantic hashing for multi-label cross-modal retrieval

Ge Song, Xiaoyang Tan, Jun Zhao, Ming Yang

Summary: RMSH is designed for more accurate multi-label cross-modal retrieval, addressing modality discrepancies and noise through fine-grained similarity of rich semantics and robust margin-adaptive triplet loss. The effective bounds derived from information coding-theoretic analysis enable our method to achieve state-of-the-art performance on multiple benchmarks.

PATTERN RECOGNITION (2021)

添加到收藏夹

Article Chemistry, Analytical

Automatic Multi-Label ECG Classification with Category Imbalance and Cost-Sensitive Thresholding

Yang Liu, Qince Li, Kuanquan Wang, Jun Liu, Runnan He, Yongfeng Yuan, Henggui Zhang

Summary: This study proposes a novel deep learning model-based learning framework and thresholding method for designing multi-label ECG classifiers, and evaluates the method on multiple realistic datasets with a cost-sensitive metric, showing superior performance in cost sensitivity.

BIOSENSORS-BASEL (2021)

添加到收藏夹

Article Computer Science, Information Systems

Multi-view multi-label learning with high-order label correlation

Bo Liu, Weibin Li, Yanshan Xiao, Xiaodong Chen, Laiwang Liu, Changdong Liu, Kai Wang, Peng Sun

Summary: Multi-label learning is a popular topic in machine learning that deals with the simultaneous association of multiple labels with given samples. This paper proposes a new multi-view multi-label learning method called ELSMML, which considers label correlation. The method constructs a crafted label correlation matrix to describe label relationships and utilizes multi-view learning and dimension reduction to exploit latent semantic label information and feature information, building a classifier in a low dimensional space. The ELSMML model is optimized using the accelerated proximal gradient method and achieves better performance compared to other baselines according to evaluation metrics.

INFORMATION SCIENCES (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Multi-label adversarial fine-grained cross-modal retrieval

Chunpu Sun, Huaxiang Zhang, Li Liu, Dongmei Liu, Lin Wang

Summary: In this paper, a novel Multi-label Adversarial Fine-grained Cross-modal Retrieval Based on Transformer (MLAT) method is proposed to bridge the semantic gap and eliminate modal specific features. The method constructs a semantic consistency enhanced module and a multi-stage adversarial learning module to optimize feature representations.

SIGNAL PROCESSING-IMAGE COMMUNICATION (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Multi-Granularity Aggregation Transformer for Joint Video-Audio-Text Representation Learning

Mengge He, Wenjing Du, Zhiquan Wen, Qing Du, Yutong Xie, Qi Wu

Summary: In this paper, a Multi-Granularity Aggregation Transformer (MGAT) is proposed for joint video-audio-text representation learning. The method overcomes the limitations of existing methods by designing a multi-granularity transformer module and an attention-guided aggregation module. The aggregated information is aligned with text information at different hierarchical levels using consistency loss and contrastive loss. Experimental results demonstrate the superiority of the proposed method on tasks such as video-paragraph retrieval and video captioning.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

SVSNet: An End-to-End Speaker Voice Similarity Assessment Model

Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Summary: This paper introduces SVSNet, the first end-to-end neural network model for assessing speaker voice similarity in voice conversion tasks. Unlike most neural evaluation metrics, SVSNet takes raw waveform as input to make full use of speech information. Experimental results on VCC2018 and VCC2020 datasets show that SVSNet outperforms baseline systems in assessing speaker similarity at both utterance and system levels.

IEEE SIGNAL PROCESSING LETTERS (2022)

添加到收藏夹

Article Acoustics

Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang

Summary: In this paper, two novel approaches are proposed to improve the generalization ability of speaker verification and reduce interference from other speakers. Experimental results show that these methods can significantly enhance system performance.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

添加到收藏夹

Article Acoustics

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Summary: This study proposes MOSA-Net, a cross-domain multi-objective speech assessment model that can estimate speech quality, intelligibility, and distortion assessment scores simultaneously. Experimental results show that MOSA-Net improves the prediction of speech quality and short-time objective intelligibility compared to existing single-task models. Moreover, MOSA-Net can be effectively adapted to predict subjective quality and intelligibility scores with limited training data. The proposed QIA-SE approach, guided by MOSA-Net's latent representations, also outperforms the baseline SE system in terms of PESQ scores.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization

Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Summary: This study proposes a neural architecture that extracts speaker representations and detects the presence of each speaker on a frame-by-frame basis, regardless of the number of speakers in a conversation. The model outperforms previous methods in tests on the CALLHOME corpus and achieves significant diarization error rate reductions in a more challenging case with simultaneous speakers ranging from 2 to 7.

IEEE SIGNAL PROCESSING LETTERS (2023)

添加到收藏夹

Article Acoustics

Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning

Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang

Summary: In this paper, a novel architecture based on self-constraint learning (SCL) and reconstruction task (RT) is proposed to remove the influence of phonetic information on speaker embedding generation. Experimental results show that the proposed DROP-TDNN system outperforms the state-of-the-art ECAPA-TDNN system on multiple datasets.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang

Summary: A noise-aware training framework based on two cascaded neural structures is proposed in this paper to jointly optimize speech enhancement and speech recognition, achieving a lower word error rate (WER).

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) (2022)

添加到收藏夹

Proceedings Paper Acoustics

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Summary: Because of the excellent performance of speech separation in cases of complete speaker overlap, the focus of research has shifted towards dealing with more realistic scenarios. However, domain mismatch between training and testing situations remains a significant problem due to various factors. This study investigates the impacts of language and channel mismatches on speech separation and proposes a new solution for channel mismatch using projection evaluation.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Acoustics

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Summary: This study proposes a multi-branched speech intelligibility prediction model (MBI-Net) to predict the subjective intelligibility scores of hearing aid users. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Acoustics

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Ryandhimas Edo Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Summary: This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human subjective listening test results and word error rate (WER) scores. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Acoustics

The VoiceMOS Challenge 2022

Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

Summary: The VoiceMOS Challenge aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. Through this challenge, 22 participating teams from academia and industry tested various approaches to predict human ratings of synthesized speech. The results highlight the effectiveness of fine-tuning self-supervised speech models for MOS prediction, as well as the challenges in predicting MOS ratings for unseen speakers, listeners, and systems in the out-of-domain setting.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Acoustics

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao

Summary: In this paper, a method called NASTAR is proposed, which addresses the training-test acoustic mismatch issue in deep learning-based speech enhancement systems by using only one sample of noisy speech in the target environment. NASTAR utilizes a feedback mechanism to simulate adaptive training data and experimental results show its effectiveness in noise adaptation.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Acoustics

Chain-based Discriminative Autoencoders for Speech Recognition

Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang

Summary: In this paper, we propose three new versions of a discriminative autoencoder (DcAE) for speech recognition, achieving superior experimental results.

INTERSPEECH 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Lip Sync Matters: A Novel Multimodal Forgery Detector

Sahibzada Adil Shahzad, Ammarah Hashmi, Sarwar Khan, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang

Summary: Deepfake technology has both positive and negative impacts on society. While there have been efforts to detect fake footage using unimodal deep learning models, this approach is insufficient for detecting multimodal manipulations. This study proposes a lip-reading-based multimodal Deepfake detection method called Lip Sync Matters, which shows superior performance in detecting forged videos.

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Multimodal Forgery Detection Using Ensemble Learning

Ammarah Hashmi, Sahibzada Adil Shahzad, Wasim Ahmad, Chia Wen Lin, Yu Tsao, Hsin-Min Wang

Summary: This paper proposes a deep forgery detection method based on audiovisual ensemble learning for the task of multimodal forgery detection, achieving a high accuracy rate of 89% in experimental results.

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Detecting Replay Attacks Using Single-Channel Audio: The Temporal Autocorrelation of Speech

Shih-Kuang Lee, Yu Tsao, Hsin-Min Wang

Summary: This paper proposes a new feature for replay detection, which utilizes the temporal auto-correlation of single-channel speech. The experimental results demonstrate that the proposed feature can effectively distinguish replay attacks, clean speech, and speech with simulated reverberation, and its utilization in a fusion system consistently improves performance. Moreover, the best fusion system achieves a zero equal error rate and a zero minimum tandem detection cost function for the first time on the development set.

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) (2022)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.