☆ 4.3 Article

Unimodal late fusion for NIST i-vector challenge on speaker detection

ELECTRONICS LETTERS (2014)

期刊

ELECTRONICS LETTERS

卷 50, 期 15, 页码 1098-1099

出版社

INST ENGINEERING TECHNOLOGY-IET

DOI: 10.1049/el.2014.1207

关键词

-

类别

Engineering, Electrical & Electronic

资金

Erasmus Mundus STRoNGTiES grant

向作者/读者索取更多资源

Protocol

Reagent

摘要

Speaker detection is a very interesting machine learning task for which the latest i-vector challenge has been coordinated by the National Institute of Standards and Technology (NIST). A simple late fusion approach for the speaker detection task on the i-vector challenge is presented. The approach is based on the late fusion of scores from the cosine distance method (the baseline) and the scores obtained from linear discriminant analysis. The results show that by adapting the simple late fusion approach, the framework can outperform the baseline score for the decision cost function on the NIST i-vector machine learning challenge.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Medicine, Legal

Speaker identification in courtroom contexts - Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology

Nabanita Basu, Agnes S. Bali, Philip Weber, Claudia Rosas-Aguilar, Gary Edmond, Kristy A. Martire, Geoffrey Stewart Morrison

Summary: This paper investigates the accuracy of speaker identification by individual lay listeners, such as judges, compared to a forensic-voice-comparison system based on state-of-the-art automatic-speaker-recognition technology. The study tests listeners with different language backgrounds and considers different courtroom contexts.

FORENSIC SCIENCE INTERNATIONAL (2022)

添加到收藏夹

Article Computer Science, Information Systems

Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection

Nianchang Huang, Yi Liu, Qiang Zhang, Jungong Han

Summary: The study proposes a novel RGB-D salient object detection model that effectively combines cross-modal features from RGB-D images and unimodal features from RGB and depth images, achieving significant performance improvement on four benchmark datasets compared to state-of-the-art methods.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

添加到收藏夹

Article Computer Science, Information Systems

Efficient Audiovisual Fusion for Active Speaker Detection

Fiseha B. Tesema, Jason Gu, Wei Song, Hong Wu, Shiqiang Zhu, Zheyuan Lin

Summary: Active speaker detection (ASD) is about identifying the person speaking in a video among visible human instances. This study proposes an efficient audiovisual fusion (AVF) approach that captures correlations between facial regions and sound signals, focusing on discriminative facial features and associating them with corresponding audio features, resulting in improved detection accuracy.

IEEE ACCESS (2023)

添加到收藏夹

Article Computer Science, Information Systems

Audio-video fusion strategies for active speaker detection in meetings

Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frederic Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrane

Summary: In this paper, we propose two different fusion techniques (naive fusion and attention-based fusion) to combine audio and visual information for active speaker detection in meetings. By using neural networks to process audio and video data, and supplementing with motion information, the system performance is improved. The method shows good detection capability in a public benchmark test.

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Information Systems

Detection of speaker liveness with CNN isolated word ASR for verification systems

Martina Slivova, Miroslav Voznak, Jaromir Tovarek, Pavol Partila

Summary: The article introduces a new speaker liveness test to prevent speech verification systems from presentation attacks. By verifying the reliability of isolated word recognition, the experiment results indicate that this method serves as a simple yet effective security component in existing SV systems.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

添加到收藏夹

Article Computer Science, Information Systems

A multimodal fusion method for sarcasm detection based on late fusion

Ning Ding, Sheng-wei Tian, Long Yu

Summary: This paper presents the state of sarcasm information on social media and the research status of sarcasm detection. By considering multiple information sources, including text, audio, and images, a multilevel late-fusion learning framework is proposed. Extensive experiments show its superiority.

MULTIMEDIA TOOLS AND APPLICATIONS (2022)

添加到收藏夹

Article Computer Science, Hardware & Architecture

Presentation attack detection based on score level fusion and challenge-response technique

Chao-Lung Chou

Summary: This paper proposes a multimodal presentation attack detection method against photo-attack and video-attack in face recognition system by using score level fusion and challenge-response scenario. Experimental results show that the proposed method can achieve the best error rate and effectively improve the security of facial recognition system.

JOURNAL OF SUPERCOMPUTING (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients

Daniele Salvati, Carlo Drioli, Gian Luca Foresti

Summary: Speaker identification aims to determine the speaker identity by analyzing voice characteristics, often using statistical models or machine learning techniques. Frequency-domain features are commonly used for sound recognition, but recent studies have also explored the use of time-domain raw waveform (RW) with deep neural network (DNN) architectures. This paper proposes a method that combines RWs and gammatone cepstral coefficients (GTCCs) using a late fusion DNN, hypothesizing that the use of both time-domain and frequency-domain features can enhance the accuracy of speaker identification in noisy and reverberant conditions. Experimental results show that the proposed method improves accuracy in adverse conditions compared to other feature choices.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Information Systems

A content-based late fusion approach applied to pedestrian detection

Jessica Sena, Artur Jorda, William Robson Schwartz

Summary: Recent works have proposed various pedestrian detectors, leading to the development of a novel method called Content-Based Spatial Consensus (CSBC), which combines spatial consensus and content to learn a weighted fusion of pedestrian detectors. This approach reduces false alarms and improves detection accuracy, overcoming state-of-the-art fusion methods on the ETH and Caltech datasets while requiring fewer detectors for effective results.

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Discriminative unimodal feature selection and fusion for RGB-D salient object detection

Nianchang Huang, Yongjiang Luo, Qiang Zhang, Jungong Han

Summary: A novel end-to-end RGB-D salient object detection model is proposed in this paper, which combines a Semantic-Guided Modality-Weight Map Generation sub-network and a Bi-directional Multi-scale Cross-modal Feature Fusion module to effectively address the issue of poor input image quality affecting discriminative ability.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Information Systems

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

Zesheng Chen

Summary: This article aims to design a detector that can distinguish original audio from audio contaminated by adversarial attacks. The proposed MEH-FEST detector calculates the minimum energy in the high-frequency band of the audio signal, showing good performance in determining whether an audio is corrupted by FAKEBOB attacks.

IEEE INTERNET OF THINGS JOURNAL (2023)

添加到收藏夹

Article Nanoscience & Nanotechnology

Additively manufactured Haynes 282: effect of unimodal vs. bimodal γ'-microstructure on mechanical properties

Reza Ghiaasiaan, Nabeel Ahmad, Paul R. Gradl, Shuai Shao, Nima Shamsaei

Summary: This study compares the effects of different heat treatment temperatures on Haynes 282 and finds that even with larger gamma' precipitates, low temperature heat treatment can achieve comparable tensile properties to conventional heat treatment.

MATERIALS SCIENCE AND ENGINEERING A-STRUCTURAL MATERIALS PROPERTIES MICROSTRUCTURE AND PROCESSING (2022)

添加到收藏夹

Article Acoustics

Multi-Source Domain Adaptation and Fusion for Speaker Verification

Donghui Zhu, Ning Chen

Summary: In this paper, a multiple source domain adaptation and fusion model is proposed to enhance the robustness of Speaker Verification (SV) task against domain mismatches. The experimental results demonstrate the superiority of the proposed fusion model over existing models, and the effectiveness of the strategies and methods employed to improve the performance. The fusion effectiveness is not greatly influenced by the hyper-parameter setting of the Modified Similarity Network Fusion (MSNF) scheme.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning

K. VijayKumar, R. Rajeswara Rao

Summary: Speaker diarization is the process of partitioning an audio source stream into homogeneous segments based on the speaker's identity. This paper proposes the use of a hybrid optimization technique, FrACWOA, along with the Deep Embedded Clustering (DEC) algorithm for speaker diarization. The proposed method outperforms existing approaches in terms of accuracy, with a significant improvement of 12.97% when using six speakers.

DATA & KNOWLEDGE ENGINEERING (2023)

添加到收藏夹

Article Biology

Depression detection based on linear and nonlinear speech features in I-vector/SVDA framework

Shamim Mobram, Mansour Vali

Summary: This study proposes depression detection systems based on the i-vector framework, aiming to classify speakers and predict depression levels. By combining linear and non-linear features and utilizing variability compensation techniques, the proposed system achieves significant improvements in accuracy.

COMPUTERS IN BIOLOGY AND MEDICINE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Enhancing the performance of 3D auto-correlation gradient features in depth action classification

Mohammad Farhad Bulbul, Saiful Islam, Zannatul Azme, Preksha Pareek, Md Humaun Kabir, Hazrat Ali

Summary: In this paper, a new method is proposed to use auto-correlation gradient features in depth action data for action recognition. By calculating depth motion map sequences and utilizing the STACOG descriptor, three vectors of 3D auto-correlation gradient features are obtained and passed to an unsupervised classifier for action recognition.

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL (2022)

添加到收藏夹

Article Computer Science, Information Systems

Feature fusion and Ensemble learning-based CNN model for mammographic image classification

Imran Ul Haq, Haider Ali, Hong Yu Wang, Cui Lei, Hazrat Ali

Summary: The world is facing a concerning situation regarding breast cancer patients. A Computer-Aided Diagnosis (CAD) system based on deep convolution neural network (DCNN) and feature fusion has been proposed to improve the detection and classification of abnormalities in mammographic scans. The model achieved high sensitivity, specificity, and accuracy in the evaluation on two publicly available datasets.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

添加到收藏夹

Article Engineering, Biomedical

Modelling intra-muscular contraction dynamics using in silico to in vivo domain translation

Hazrat Ali, Johannes Umander, Robin Rohlen, Oliver Roehrle, Christer Gronlund

Summary: This study proposes a deep learning approach to model authentic intra-muscular skeletal muscle contraction patterns, bridging the gap between simulated and experimental image sequences. The results demonstrate the potential of generating spatio-temporal features similar to in vivo data, offering insights for neuromuscular imaging research.

BIOMEDICAL ENGINEERING ONLINE (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

R2U++: a multiscale recurrent residual U-Net with dense skip connections for medical image segmentation

Mehreen Mubashar, Hazrat Ali, Christer Gronlund, Shoaib Azmat

Summary: U-Net is a widely used neural network in the field of medical image segmentation. However, its performance on complex datasets is not satisfactory. To address this issue, several variants such as R2U-Net and UNET++ have been proposed. In this paper, we propose a new U-Net-based medical image segmentation architecture called R2U++, which overcomes the limitations of traditional U-Net by incorporating deeper recurrent residual convolutional blocks and dense skip connections. Experimental results on multiple medical imaging datasets demonstrate that R2U++ achieves significant improvements in IoU and dice scores compared to UNET++ and R2U-Net.

NEURAL COMPUTING & APPLICATIONS (2022)

添加到收藏夹

Article Telecommunications

Link-level performance abstraction for mimo receivers using artificial neural network

Asif Khan, Alam Zaib, Hazrat Ali, Shahid Khattak

Summary: This paper presents a novel framework for link-level performance abstraction for MIMO receivers using a neural network model. The framework achieves good performance in different channel conditions by training the neural network with data generated from link-level simulations.

TELECOMMUNICATION SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Hardware & Architecture

Toward Optimal Softcore Carry-aware Approximate Multipliers on Xilinx FPGAs

Muhammad Awais, Ali Zahir, Syed Ayaz Ali Shah, Pedro Reviriego, Anees Ullah, Nasim Ullah, Adam Khan, Hazrat Ali

Summary: Domain-specific accelerators for signal processing, image processing, and machine learning are increasingly implemented on SRAM-based field-programmable gate arrays (FPGAs). This article presents an optimized carry-aware approximate radix-4 Booth multiplier design that leverages built-in slice look-up tables (LUTs) and carry-chain resources in a novel configuration. The proposed design offers significant improvements in FPGA resource usage, power delay product, performance metric, and errors compared to the latest state-of-the-art designs, making it an attractive choice for multiplication on FPGA-based accelerators.

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS (2023)

添加到收藏夹

Article Engineering, Electrical & Electronic

Implementation of a Modified U-Net for Medical Image Segmentation on Edge Devices

Owais Ali, Hazrat Ali, Syed Ayaz Ali Shah, Aamir Shahzad

Summary: This paper presents the implementation of a modified U-Net model for medical image segmentation on low-power devices. The experimental results demonstrate comparable performance to the traditional U-Net model, while requiring lower resource consumption.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Real-time automated detection of older adults' hand gestures in home and clinical settings

Guan Huang, Son N. Tran, Quan Bai, Jane Alty

Summary: There is a pressing need for remote evaluation of hand movements by clinicians and neuroscientists, especially in light of the COVID-19 pandemic, to detect and monitor degenerative brain disorders prevalent in older adults. This study developed a computer vision-based method that accurately detects hand gestures of older adults using real-life video data. Through validated hand movement tests conducted at home or in clinic, data were collected and analyzed to evaluate different network structures. The newly developed RGRNet showed promising results in detecting hand gestures, offering potential applications in medical and research fields.

NEURAL COMPUTING & APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Feature selection enhancement and feature space visualization for speech-based emotion recognition

Sofia Kanwal, Sohail Asghar, Hazrat Ali

Summary: This study presents a speech features enhancement strategy for improving speech emotion recognition. The method outperforms state-of-the-art methods in two different language datasets, demonstrating its effectiveness.

PEERJ COMPUTER SCIENCE (2022)

添加到收藏夹

Article Engineering, Biomedical

Translation of atherosclerotic disease features onto healthy carotid ultrasound images using domain-to-domain translation

Hazrat Ali, Emma Nyman, Ulf Naslund, Christer Gronlund

Summary: This work evaluated a model that translates features of atherosclerotic disease onto healthy carotid ultrasound images. The cycleGAN model successfully translated disease features onto healthy images while retaining the overall anatomical contents. This has important implications for education, cardiovascular risk communication, and personalized modeling in precision medicine.

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2023)

添加到收藏夹

Review Computer Science, Artificial Intelligence

Identifying the role of vision transformer for skin cancer-A scoping review

Sulaiman Khan, Hazrat Ali, Zubair Shah

Summary: This scoping review examines the use of vision transformers for skin lesion detection and finds that this approach has shown outstanding performance in detecting skin cancer. However, there are some challenges that hinder the trustworthiness of vision transformers in skin cancer diagnosis, such as intrinsic visual ambiguities and irregular lesion shapes. The findings of this review provide new insights and suggest the best segmentation techniques for accurate lesion boundary identification and melanoma diagnosis.

FRONTIERS IN ARTIFICIAL INTELLIGENCE (2023)

添加到收藏夹

Review Computer Science, Information Systems

Artificial Intelligence and Biosensors in Healthcare and Its Clinical Relevance: A Review

Rizwan Qureshi, Muhammad Irfan, Hazrat Ali, Arshad Khan, Aditya Shekhar Nittala, Shawkat Ali, Abbas Shah, Taimoor Muzaffar Gondal, Ferhat Sadak, Zubair Shah, Muhammad Usman Hadi, Sheheryar Khan, Qasem Al-Tashi, Jia Wu, Amine Bermak, Tanvir Alam

Summary: Data generated from various sources in the medical field has rapidly increased in the past decade, and computational hardware advancements have enabled the utilization of this data through sophisticated AI techniques. This article provides an overview of recent advancements in AI and biosensors in medical and life sciences, including the role of machine learning in medical imaging, precision medicine, and IoT-based biosensors. The article also discusses the progress in wearable biosensing technologies and computing technologies for medical data, as well as challenges and future prospects.

IEEE ACCESS (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Understanding Tumor Micro Environment Using Graph Theory

Kinza Rohail, Saba Bashir, Hazrat Ali, Tanvir Alam, Sheheryar Khan, Jia Wu, Pingjun Chen, Rizwan Qureshi

Summary: Based on historical data statistics, the survival rate of patients with CLL is about 65%. Aggressive and rare variants of CLL, aCLL and RT-DLBL, have lower survival rates in patients and worsen with age. A framework combining Graph Theory, Gaussian Mixture Modeling, and Fuzzy C-mean Clustering was developed for studying neoplastic lymphomas, providing quantitative analysis of pathological facts through integration of Image and Nuclei level analysis. The proposed algorithm outperforms existing algorithms with a mean diagnosis accuracy of 0.70833.

COMPUTER VISION - ACCV 2022 WORKSHOPS (2023)

添加到收藏夹

Review Medical Informatics

Artificial Intelligence-Based Methods for Integrating Local and Global Features for Brain Cancer Imaging: Scoping Review

Hazrat Ali, Rizwan Qureshi, Zubair Shah

Summary: This study reviews the application of different vision transformers (ViTs) in brain cancer diagnosis and tumor segmentation, focusing on the enhancement of brain tumor segmentation task through different architectures and the improvement of convolutional neural networks' performance using ViT-based models.

JMIR MEDICAL INFORMATICS (2023)

添加到收藏夹

Article Telecommunications

Classification with ensembles and case study on functional magnetic resonance imaging

Adnan O. M. Abuassba, Zhang Dezheng, Hazrat Ali, Fan Zhang, Khan Ali

Summary: This paper proposes a general framework for creating ensembles in the context of classification. The framework consists of four stages and takes into account diversity and efficiency. Experimental results validate the effectiveness of the proposed approach.

DIGITAL COMMUNICATIONS AND NETWORKS (2022)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.