☆ 4.7 Article

Predicting Diverse Future Frames With Local Transformation-Guided Masking

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2019)

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

卷 29, 期 12, 页码 3531-3543

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2018.2882061

关键词

Predictive models; Generators; Task analysis; Visualization; Computational modeling; Complexity theory; Training; Video prediction; diverse future frames; local transformation level; transformation-guided masking; region of interest; video prediction on single frame

类别

Engineering, Electrical & Electronic

资金

Shenzhen Peacock Plan [20130408-183003656]
Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality [ZDSYS-201703031405467]
National Natural Science Foundation of China [U-1613209]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Video prediction is the challenging task of generating the future frames of a video given a sequence of previously observed frames. This task involves the construction of an internal representation that accurately models the frame evolutions, including contents and dynamics. Video prediction is considered difficult due to the inherent compounding of errors in recursive pixel level prediction. In this paper, we present a novel video prediction system that focuses on regions of interest (ROIs) rather than on entire frames and learns frame evolutions at the transformation level rather than at the pixel level. We provide two strategies to generate high-quality ROIs that contains potential moving visual cues. The frame evolutions are modeled with a transformation generator that produces transformers and masks simultaneously, which are then combined to generate the future frame in a transformation-guided masking procedure. Compared with recent approaches, our system is able to generate more accurate predictions by modeling the visual evolutions at the transformation level rather than at the pixel level. Focusing on ROIs avoids a heavy computational burden and enables our system to generate high-quality long-term future frames without severely amplified signal loss. Moreover, our system is able to generate diverse plausible future frames, which is important in many real-world scenarios. Furthermore, we enable our system to perform video prediction conditioned on a single frame by revising the transformation generator to produce motion-centric transformers. We test our system on four datasets with different experimental settings and demonstrate its advantages over recent methods, both quantitatively and qualitatively.

Predicting Diverse Future Frames With Local Transformation-Guided Masking

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predicting Diverse Future Frames With Local Transformation-Guided Masking

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文