4.7 Article

A Generic Framework for Video Annotation via Semi-Supervised Learning

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 14, 期 4, 页码 1206-1219

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2012.2191944

关键词

Broadcast video; concave-convex procedure (CCCP); event detection; graph; Internet; multiple instance learning; semi-supervised learning; web-casting text

资金

  1. 973 Program [2010CB327905, 2012CB316304]
  2. National Natural Science Foundation of China [60833006, 61070104, 90920303]

向作者/读者索取更多资源

Learning-based video annotation is essential for video analysis and understanding, and many various approaches have been proposed to avoid the intensive labor costs of purely manual annotation. However, there lacks a generic framework due to several difficulties, such as dependence of domain knowledge, insufficiency of training data, no precise localization and inefficacy for large-scale video dataset. In this paper, we propose a novel approach based on semi-supervised learning by means of information from the Internet for interesting event annotation in videos. Concretely, a Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm, which aims to simultaneously tackle these difficulties in a generic framework for various video domains (e. g., sports, news, and movies), is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the models. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e. g., movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e. g., YouTube, Google) in order to give more distributive information for event modeling. Two critical issues of FGSSMIL are: 1) how to calculate the weight assignment for a graph construction, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers; 2) how to solve the algorithm efficiently for large-scale dataset through an optimization approach. To address this issue, Concave-Convex Procedure (CCCP) and nonnegative multiplicative updating rule are adopted. We perform the extensive experiments in three popular video domains: movies, sports, and news. The results compared with the state-of-the-arts are promising and demonstrate the effectiveness and efficiency of our proposed approach.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Cross-modality paired-images generation and augmentation for RGB-infrared person re-identification

Guan'an Wang, Yang Yang, Tianzhu Zhang, Jian Cheng, Zengguang Hou, Prayag Tiwari, Hari Mohan Pandey

NEURAL NETWORKS (2020)

Article Computer Science, Information Systems

Part-based Structured Representation Learning for Person Re-identification

Yaoyu Li, Hantao Yao, Tianzhu Zhang, Changsheng Xu

Summary: PSRL proposes a novel method to improve the descriptive ability of person representation by fusing local features considering the person structure. The architecture includes two important modules: Local Semantic Feature Extraction and Structured Person Representation Learning.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2021)

Editorial Material Engineering, Electrical & Electronic

Introduction to the Special Section on Intelligent Visual Content Analysis and Understanding

Hongliang Li, Lu Fang, Tianzhu Zhang

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2020)

Article Computer Science, Information Systems

Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval

Xinhong Ma, Tianzhu Zhang, Changsheng Xu

IEEE TRANSACTIONS ON MULTIMEDIA (2020)

Article Computer Science, Artificial Intelligence

Diverse Complementary Part Mining for Weakly Supervised Object Localization

Meng Meng, Tianzhu Zhang, Wenfei Yang, Jian Zhao, Yongdong Zhang, Feng Wu

Summary: Weakly Supervised Object Localization (WSOL) aims to localize objects using only image-level labels, providing better scalability and practicality than fully supervised methods. However, current techniques based on classification networks only highlight discriminative parts of objects, neglecting the entire object. To address this issue, this paper proposes a novel end-to-end part discovery model (PDM) that learns multiple discriminative object parts for accurate localization and classification.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

Article Computer Science, Information Systems

Focus Your Attention: A Focal Attention for Multimodal Learning

Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, An-An Liu, Bin Wang, Yongdong Zhang

Summary: The paper introduces a novel focal attention mechanism to achieve more accurate semantic alignment in multimodal learning, by selectively attending to relevant sub-elements and preventing interference from irrelevant ones. Extensive experiments on image-text matching and text-to-image generation demonstrate that the focal attention significantly outperforms existing methods, providing effectiveness validation in various multimodal tasks.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Artificial Intelligence

Learning to Model Relationships for Zero-Shot Video Classification

Junyu Gao, Tianzhu Zhang, Changsheng Xu

Summary: This study proposes a task-driven message passing process using a prototype-sample GNN to achieve zero-shot learning in video classification, successfully establishing relationships between categories and attributes, and achieving favorable performance on five popular video benchmarks.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Article Automation & Control Systems

Robust Collaborative Learning of Patch-Level and Image-Level Annotations for Diabetic Retinopathy Grading From Fundus Image

Yehui Yang, Fangxin Shang, Binghong Wu, Dalu Yang, Lei Wang, Yanwu Xu, Wensheng Zhang, Tianzhu Zhang

Summary: This article introduces a robust framework for DR severity grading that collaboratively utilizes patch-level and image-level annotations, exchanging grade information bidirectionally to incorporate fine-grained lesion details and image-level grades for improved performance. The algorithm has shown better performance than state-of-the-art algorithms and clinical ophthalmologists, proving its robustness in facing real-world variations. Extensive ablation studies have been conducted to validate the effectiveness and necessity of each motivation in the proposed framework.

IEEE TRANSACTIONS ON CYBERNETICS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection

Wenfei Yang, Tianzhu Zhang, Xiaoyuan Yu, Tian Qi, Yongdong Zhang, Feng Wu

Summary: The proposed Uncertainty Guided Collaborative Training (UGCT) strategy effectively improves the performance of attention based methods for weakly supervised temporal action detection by generating pseudo labels online and mitigating noise in the generated labels. Experimental results show a significant performance improvement of more than 4% for all three methods on the THUMOS14 dataset.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Lesion-Aware Transformers for Diabetic Retinopathy Grading

Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang

Summary: The study proposed a novel lesion-aware transformer (LAT) for diabetic retinopathy (DR) grading and lesion discovery, achieving the tasks through an encoder-decoder structure. This method effectively addresses the issues of lesion localization and diversity recognition, and demonstrates superior performance on multiple benchmark tests.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang

Summary: This paper introduces an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which mitigates challenges by learning action unit memory bank and utilizes diverse mechanisms. It is the first to explicitly model action units with a memory network, showing superior performance compared to state-of-the-art methods on standard benchmarks.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Article Computer Science, Artificial Intelligence

Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection

Wenfei Yang, Tianzhu Zhang, Zhendong Mao, Yongdong Zhang, Qi Tian, Feng Wu

Summary: This paper proposed an end-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection, which explores both the global and local structure information to effectively learn discriminative structure aware representations for robust and complete action detection. Extensive experimental results on two benchmark datasets demonstrate that MSA-Net outperforms state-of-the-art methods.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Consistency Graph Modeling for Semantic Correspondence

Jianfeng He, Tianzhu Zhang, Yuhui Zheng, Mingliang Xu, Yongdong Zhang, Feng Wu

Summary: This paper proposes a novel end-to-end Consistency Graph Modeling Network (CGMNet) for semantic correspondence by jointly modeling inter-image relationship, intra-image relationship and cycle consistency. CGMNet performs well in experiments and is validated on multiple challenging datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Artificial Intelligence

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding

Wenfei Yang, Tianzhu Zhang, Yongdong Zhang, Feng Wu

Summary: LCNet utilizes hierarchical representation of video and text features and introduces a self-supervised cycle-consistent loss to effectively learn the matching relationships between video and text, achieving superior performance compared to existing weakly supervised methods.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Information Systems

Density-Aware Multi-Task Learning for Crowd Counting

Xiaoheng Jiang, Li Zhang, Tianzhu Zhang, Pei Lv, Bing Zhou, Yanwei Pang, Mingliang Xu, Changsheng Xu

Summary: In this study, a novel density-aware convolutional neural network (DensityCNN) method is proposed to perform crowd counting by learning density-level classification and density map estimation. Extensive experiments demonstrate the high effectiveness of the proposed method across multiple datasets.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

暂无数据