☆ 4.7 Article

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

IEEE TRANSACTIONS ON MULTIMEDIA (2013)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 15, 期 7, 页码 1553-1568

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2013.2267205

关键词

Attention; audio saliency; fusion; movie summarization; multimodal saliency; multistream processing; text saliency; video summarization; visual saliency

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

project COGN-IMUSE
European Social Fund (ESF)
EU project DIRHA [FP7-ICT-2011-7-288121]
Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework-Research Funding Program: Heracleitus II
National Resources

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文