☆ 4.7 Article

Rate-Accuracy Trade-Off in Video Classification With Deep Convolutional Neural Networks

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2020)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 30, Issue 1, Pages 145-154

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2018.2887408

Keywords

Video classification; convolutional neural networks; video streaming

Funding

Distinguished Scholar Award from the Arab Fund Fellowships Programme
Royal Commission for the Exhibition of 1851
EPSRC [EP/P02243X/1, EP/R025290/1]
Leverhulme Trust (RAEng/Leverhulme Senior Research Fellowship)
EPSRC [EP/P02243X/1, EP/R025290/1] Funding Source: UKRI

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Advanced video classification systems decode video frames to derive texture and motion representations for ingestion and analysis by spatio-temporal deep convolutional neural networks (CNNs). However, when considering visual Internet-of-Things applications, surveillance systems, and semantic crawlers of large video repositories, the video capture and the CNN-based semantic analysis parts do not tend to be co-located. This necessitates the transport of compressed video over networks and incurs significant overhead in bandwidth and energy consumption, thereby significantly undermining the deployment potential of such systems. In this paper, we investigate the trade-off between the encoding bitrate and the achievable accuracy of CNN-based video classification models that directly ingest AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video bitstreams and applying complex optical flow calculations prior to CNN processing, we only retain motion vector and select texture information at significantly reduced bitrates and apply no additional processing prior to CNN ingestion. Based on three CNN architectures and two action recognition datasets, we achieve 11%-94% savings in bitrate with marginal effect on classification accuracy. A model-based selection between multiple CNNs increases these savings further to the point where, if up to 7% loss of accuracy can be tolerated, video classification can take place with as little as 3 kb/s for the transport of the required compressed video information to the system implementing the CNN models.

Rate-Accuracy Trade-Off in Video Classification With Deep Convolutional Neural Networks

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Rate-Accuracy Trade-Off in Video Classification With Deep Convolutional Neural Networks

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper