4.7 Article

Perceptually Aware Image Retargeting for Mobile Devices

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 27, Issue 5, Pages 2301-2313

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2017.2779272

Keywords

Mobile platform; retarget; perceptual; gaze behavior; deep feature; probabilistic model

Funding

  1. Natural Science Foundation of Zhejiang Province [LQ16F030006]
  2. National Natural Science Foundation of China [61503110, 61572169, 61472266]
  3. National University of Singapore (Suzhou) Research Institute, Suzhou, China
  4. Fundamental Research Funds for the Central Universities

Ask authors/readers for more resources

Retargeting aims at adapting an original high-resolution photograph/video to a low-resolution screen with an arbitrary aspect ratio. Conventional approaches are generally based on desktop PCs, since the computation might be intolerable for mobile platforms (especially when retargeting videos). Typically, only low-level visual features are exploited, and human visual perception is not well encoded. In this paper, we propose a novel retargeting framework that rapidly shrinks a photograph/video by leveraging human gaze behavior. Specifically, we first derive a geometry-preserving graph ranking algorithm, which efficiently selects a few salient object patches to mimic the human gaze shifting path (GSP) when viewing a scene. Afterward, an aggregation-based CNN is developed to hierarchically learn the deep representation for each GSP. Based on this, a probabilistic model is developed to learn the priors of the training photographs that are marked as aesthetically pleasing by professional photographers. We utilize the learned priors to efficiently shrink the corresponding GSP of a retargeted photograph/video to maximize its similarity to those from the training photographs. Extensive experiments have demonstrated that: 1) our method requires less than 35 ms to retarget a 1024x768 photograph (or a 1280x720 video frame) on popular iOS/Android devices, which is orders of magnitude faster than the conventional retargeting algorithms; 2) the retargeted photographs/videos produced by our method significantly outperform those of its competitors based on a paired-comparison-based user study; and 3) the learned GSPs are highly indicative of human visual attention according to the human eye tracking experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Flickr Image Community Analytics by Deep Noise-Refined Matrix Factorization

Luming Zhang, Jianwei Yin, Ping Li, Yongheng Shang, Roger Zimmermann, Ling Shao

IEEE TRANSACTIONS ON MULTIMEDIA (2020)

Article Computer Science, Information Systems

Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks

Li Yuan, Francis Eng Hock Tay, Ping Li, Jiashi Feng

IEEE TRANSACTIONS ON MULTIMEDIA (2020)

Article Computer Science, Artificial Intelligence

Exploring global diverse attention via pairwise temporal relation for video summarization

Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao

Summary: Video summarization using the proposed SUM-GDA method enhances diversity in summary frames by adapting global diverse attention mechanism, outperforming existing methods with remarkable improvements in computational efficiency.

PATTERN RECOGNITION (2021)

Article Computer Science, Information Systems

Video summarization with a graph convolutional attention network

Ping Li, Chao Tang, Xianghua Xu

Summary: The proposed graph convolutional attention network (GCAN) for video summarization effectively integrates embedding learning and context fusion to generate compact and informative video summaries by considering both local and global relations among video frames.

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING (2021)

Article Computer Science, Artificial Intelligence

Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning

Ping Li, Pan Zhang, Xianghua Xu

Summary: This article introduces a GMMP (GCN Meta-learning with Multi-granularity POS) method based on multi-granularity POS for generating high-quality video captions. It models temporal dependency by treating video frames as nodes in a graph and captures POS information of words and phrases using a multi-granularity POS attention mechanism.

NEUROCOMPUTING (2022)

Article Computer Science, Information Systems

Coarse-to-fine few-shot classification with deep metric learning

Ping Li, Guopan Zhao, Xianghua Xu

Summary: This work presents a Coarse-to-Fine few-shot classification framework based on Metric-based Auxiliary learning to address the challenges of handling sample pairs with different similarity degrees and learning discriminant patterns from very few labeled samples per class.

INFORMATION SCIENCES (2022)

Article Computer Science, Artificial Intelligence

Deep metric learning via group channel-wise ensemble

Ping Li, Guopan Zhao, Jiajun Chen, Xianghua Xu

Summary: Deep metric learning uses deep neural networks to learn the distance metric for data samples, aiming to encode the similarity between semantically related samples. However, learning a single metric using all samples fails to encode the similarity in different aspects. To address this issue, this paper proposes a Group Channel-wise Ensemble method that learns multiple distance metrics by partitioning the embedding space and using group channel-wise convolution blocks in convolution networks.

KNOWLEDGE-BASED SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Prototype contrastive learning for point-supervised temporal action detection

Ping Li, Jiachen Cao, Xingchao Ye

Summary: This paper presents a point-level supervised temporal action detection framework based on prototype contrastive learning. It addresses the label sparsity and class imbalance problems by generating pseudo labels and utilizes prototype learning and contrastive representation learning to achieve discriminative prototype representations. Experimental results demonstrate the superior performance of the proposed method.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Computer Science, Information Systems

Time-frequency recurrent transformer with diversity constraint for dense video captioning

Ping Li, Pan Zhang, Tao Wang, Huaxin Xiao

Summary: A Time-Frequency recurrent Transformer with Diversity constraint (TFTD) is proposed for dense video captioning, which includes a time-frequency memory module to consider temporal relations and model motion dependency. The Determinantal Point Processes (DDP) is adopted to impose diversity loss and reduce redundancy in generated sentences. Experimental results demonstrate the superior performance of TFTD in terms of metrics and coherence compared to competitive alternatives.

INFORMATION PROCESSING & MANAGEMENT (2023)

Article Computer Science, Artificial Intelligence

Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection

Ping Li, Jiachen Cao, Li Yuan, Qinghao Ye, Xianghua Xu

Summary: In this paper, the authors propose a Multi-scale Dilation based Truncated Attention Proposal Network (MD-TAPN) model for temporal action detection, which achieves state-of-the-art performances on two benchmark databases. The model learns positive proposal relations by dynamically adjusting edge weights and suppresses disadvantageous relations by truncating negative attention scores. It also handles different action durations with a light multi-scale dilation module to increase proposal representation capacity.

PATTERN RECOGNITION (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Temporal Cue Guided Video Highlight Detection with Low-Rank Audio-Visual Fusion

Qinghao Ye, Xiyue Shen, Yuan Gao, Zirui Wang, Qi Bi, Ping Li, Guang Yang

Summary: In this paper, a novel weakly supervised method for automated video highlight detection is proposed, achieving remarkable improvements over state-of-the-art methods. The method leverages audio-visual feature fusion, hierarchical temporal context encoding, and attention-gated instance aggregation to enhance detection performance and address existing issues. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed approach.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Engineering, Electrical & Electronic

Deep Transfer Learning for WiFi Localization

Peizheng Li, Han Cui, Aftab Khan, Usman Raza, Robert Piechocki, Angela Doufexi, Tim Farnham

Summary: This study introduces a WiFi indoor localisation technique based on deep learning, achieving high accuracy in different environments and exploring the effectiveness of model transfer to save training time and parameters.

2021 IEEE RADAR CONFERENCE (RADARCONF21): RADAR ON THE MOVE (2021)

Article Computer Science, Artificial Intelligence

Semantics-Aware Hidden Markov Model for Human Mobility

Hongzhi Shi, Yong Li, Hancheng Cao, Xiangxin Zhou, Chao Zhang, Vassilis Kostakos

Summary: This paper introduces a novel semantics-aware mobility model that leverages large-scale semantic-rich spatial-temporal data from location-based social networks to capture human mobility motivation.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

Proceedings Paper Computer Science, Information Systems

paper2repo: GitHub Repository Recommendation for Academic Papers

Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, Tarek Abdelzaher

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

Proceedings Paper Computer Science, Information Systems

Discriminative Topic Mining via Category-Name Guided Text Embedding

Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

No Data Available