4.6 Article

Scene graph captioner: Image captioning based on structural visual representation

Journal

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jvcir.2018.12.027

Keywords

Image captioning; Scene graph; Structural representation; Attention

Funding

  1. National Natural Science Foundation of China [61772359, 61472275, 61502337, 61701341]

Ask authors/readers for more resources

While deep neural networks have recently achieved promising results on the image captioning task, they do not explicitly use the structural visual and textual knowledge within an image. In this work, we propose the Scene Graph Captioner (SGC) framework for the image captioning task, which captures the comprehensive structural semantic of visual scene by explicitly modeling objects, attributes of objects, and relationships between objects. Firstly, we develop an approach to generate the scene graph by learning individual modules on the large object, attribute and relationship datasets. Then, SGC incorporates high-level graph information and visual attention information into a deep captioning framework. Specifically, we propose a novel framework to embed a scene graph into the structural representation, which captures the semantic concepts and the graph topology. Further, we develop the scene-graph-driven method to generate the attention graph by exploiting high internal homogeneity and external inhomogeneity among the nodes in the scene graph. Finally, a LSTM-based framework translates these information into text. We evaluate the proposed framework on a held-out MSCOCO dataset. (C) 2018 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available