☆ 4.6 Article

Cross-Modal Object Detection Based on a Knowledge Update

SENSORS (2022)

Journal

SENSORS

Volume 22, Issue 4, Pages -

Publisher

MDPI

DOI: 10.3390/s22041338

Keywords

multimodality; multimodal encoder; graph convolutional network; knowledge update

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, a knowledge update-based multimodal object recognition model is proposed to take full advantage of multi-source information for object recognition. The model utilizes Faster R-CNN for image regionalization and employs a graph convolutional network and an external knowledge base to model the relationships between objects. Experimental results validate the effectiveness of the proposed model.

As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations. To take full advantage of multi-source information, a knowledge update-based multimodal object recognition model is proposed in this paper. Specifically, our method initially uses Faster R-CNN to regionalize the image, then applies a transformer-based multimodal encoder to encode visual region features (region-based image features) and textual features (semantic relationships between words) corresponding to pictures. After that, a graph convolutional network (GCN) inference module is introduced to establish a relational network in which the points denote visual and textual region features, and the edges represent their relationships. In addition, based on an external knowledge base, our method further enhances the region-based relationship expression capability through a knowledge update module. In summary, the proposed algorithm not only learns the accurate relationship between objects in different regions of the image, but also benefits from the knowledge update through an external relational database. Experimental results verify the effectiveness of the proposed knowledge update module and the independent reasoning ability of our model.

Cross-Modal Object Detection Based on a Knowledge Update

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Cross-Modal Object Detection Based on a Knowledge Update

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper