☆ 4.6 Article

Integrating Scene Semantic Knowledge into Image Captioning

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2021)

Journal

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

Volume 17, Issue 2, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3439734

Keywords

Image captioning; attention mechanism; scene semantics; encoder-decoder framework

Funding

National Natural Science Foundation of China [61966004, 61663004, 61866004, 61762078]
Guangxi Natural Science Foundation [2019GXNSFDA245018, 2018GXNSFDA281009]
Guangxi Bagui Scholar Teams for Innovation and Research Project
Guangxi Talent Highland Project of Big Data Intelligence and Application
Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Most existing image captioning methods lack effective scene semantic information and cannot adjust focus intensity on the image. This article proposes an improved visual attention model that combines visual and semantic information to generate more accurate, scene-specific captions.

Most existing image captioning methods use only the visual information of the image to guide the generation of captions, lack the guidance of effective scene semantic information, and the current visual attention mechanism cannot adjust the focus intensity on the image. In this article, we first propose an improved visual attention model. At each timestep, we calculated the focus intensity coefficient of the attention mechanism through the context information of themodel, then automatically adjusted the focus intensity of the attention mechanism through the coefficient to extract more accurate visual information. In addition, we represented the scene semantic knowledge of the image through topic words related to the image scene, then added them to the language model. We used the attention mechanism to determine the visual information and scene semantic information that the model pays attention to at each timestep and combined them to enable the model to generate more accurate and scene-specific captions. Finally, we evaluated our model on Microsoft COCO (MSCOCO) and Flickr30k standard datasets. The experimental results show that our approach generates more accurate captions and outperforms many recent advanced models in various evaluation metrics.

Integrating Scene Semantic Knowledge into Image Captioning

Journal

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Integrating Scene Semantic Knowledge into Image Captioning

Journal

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper