☆ 4.6 Article

Scene graph captioner: Image captioning based on structural visual representation

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION (2019)

Journal

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

Volume 58, Issue -, Pages 477-485

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jvcir.2018.12.027

Keywords

Image captioning; Scene graph; Structural representation; Attention

Funding

National Natural Science Foundation of China [61772359, 61472275, 61502337, 61701341]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

While deep neural networks have recently achieved promising results on the image captioning task, they do not explicitly use the structural visual and textual knowledge within an image. In this work, we propose the Scene Graph Captioner (SGC) framework for the image captioning task, which captures the comprehensive structural semantic of visual scene by explicitly modeling objects, attributes of objects, and relationships between objects. Firstly, we develop an approach to generate the scene graph by learning individual modules on the large object, attribute and relationship datasets. Then, SGC incorporates high-level graph information and visual attention information into a deep captioning framework. Specifically, we propose a novel framework to embed a scene graph into the structural representation, which captures the semantic concepts and the graph topology. Further, we develop the scene-graph-driven method to generate the attention graph by exploiting high internal homogeneity and external inhomogeneity among the nodes in the scene graph. Finally, a LSTM-based framework translates these information into text. We evaluate the proposed framework on a held-out MSCOCO dataset. (C) 2018 Elsevier Inc. All rights reserved.

Scene graph captioner: Image captioning based on structural visual representation

Journal

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Scene graph captioner: Image captioning based on structural visual representation

Journal

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper