☆ 4.7 Article

Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2021)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 31, Issue 8, Pages 3118-3127

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2020.3036860

Keywords

Visualization; Training; Generators; Reinforcement learning; Decoding; Streaming media; Recurrent neural networks; Captioning; graph convolutional networks; adaptive noise

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The proposed NADGCN model utilizes grid-stream GCN as a supplement to the region stream and enhances the generalization of the language model by adding a noise module. Experimental results show that it outperforms the comparative baseline models.

Image captioning, aiming at generating natural sentences to describe image contents, has received significant attention with remarkable improvements in recent advances. The problem nevertheless is not trivial for cross-modal training due to the two challenges: 1) image detectors often consider only salient areas in an image and seldom explore the rich background context; 2) the language model is highly vulnerable to small but intentional perturbation attacks. To alleviate these issues, we propose the Noise Augmented Double-stream Graph Convolutional Networks (NADGCN) that novelly exploits the additional background context and enhances the generalization of the language model. Technically, NADGCN capitalizes on grid-stream GCN as a supplementary to the region stream, following the recipe that a rescaled grid graph can encode the relationship across grid areas over the full image rather than salient areas only. Moreover, we devise a noise module and integrate into the double-stream GCN to augment the capability of the basic generator. Such noise module introduces adaptive noise into the Recurrent Neural Networks (RNN) and is learnt through regarding the module as an agent with a stochastic Gaussian policy in Reinforcement Learning (RL). Extensive experiments on MSCOCO validate the design of the grid-stream GCN and the noise agent, and our generator outperforms the comparative baselines clearly.

Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper