4.7 Article

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 15, 期 7, 页码 1553-1568

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2013.2267205

关键词

Attention; audio saliency; fusion; movie summarization; multimodal saliency; multistream processing; text saliency; video summarization; visual saliency

资金

  1. project COGN-IMUSE
  2. European Social Fund (ESF)
  3. EU project DIRHA [FP7-ICT-2011-7-288121]
  4. Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework-Research Funding Program: Heracleitus II
  5. National Resources

向作者/读者索取更多资源

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据