☆ 4.6 Article

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2010)

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

卷 6, 期 4, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/1865106.1865111

关键词

Experimentation; Speech; visual localization; speaker diarization; multimodal integration

类别

Computer Science, Information Systems Computer Science, Software Engineering Computer Science, Theory & Methods

资金

Swiss IM-2
EU
A*STAR

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the-art audio-only speaker diarization system (speaker localization in time) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system first automatically determines the speech regions and estimates who spoke when, then, in a second step, the visual models are used to infer the location of the speakers in the video. We call this process dialocalization. The experiments were performed on real-world meetings using 4.5 hours of the publicly available AMI meeting corpus. The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audio-only) speaker diarization, but also adds visual speaker localization at little incremental engineering and computation costs. The combined algorithm has different properties, such as increased robustness, that cannot be observed in algorithms based on single modalities. The article describes the algorithm, presents benchmarking results, explains its properties, and systematically discusses the contributions of each modality.

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Dialocalization: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem

期刊

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文