4.7 Article

Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2022.3182426

Keywords

Semantics; Visualization; Feature extraction; Correlation; Learning systems; Task analysis; Filtration; Dual-level feature enhancement; multi-block matching; image-text retrieval

Funding

  1. National Natural Science Foundation of China [U21B2024]
  2. National Key Research and Development Program of China [2021YFF0704003]
  3. Baidu Program

Ask authors/readers for more resources

This paper proposes a dual-level representation enhancement network (DREN) to improve image-text retrieval. By exploring characteristics and contexts of regions and words in a joint manner, accurate matching of image-text pairs is achieved, leading to superior retrieval performance.
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received growing attention since it connects heterogeneous data. Previous methods that perform well on image-text retrieval mainly focus on the interaction between image regions and text words. But these approaches lack joint exploration of characteristics and contexts of regions and words, which will cause semantic confusion of similar objects and loss of contextual understanding. To address these issues, a dual-level representation enhancement network (DREN) is proposed to strength the characteristic and contextual representations by innovative block-level and instance-level representation enhancement modules, respectively. The block-level module focuses on mining the potential relations between multiple blocks within each instance representation, while the instance-level module concentrates on learning the contextual relations between different instances. To facilitate the accurate matching of image-text pairs, we propose the graph correlation inference and weighted adaptive filtering to conduct the local and global matching between image-text pairs. Extensive experiments on two challenging datasets (i.e., Flickr30K and MSCOCO) verify the superiority of our method for image-text retrieval.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available