4.7 Article

Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 18, 期 2, 页码 208-218

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2015.2508146

关键词

Common space; cross-modal retrieval; discriminative dictionary learning; label alignment

资金

  1. National Natural Science Foundation of China [61572388, 61125204, 61432014, 61303220]
  2. National High Technology Research and Development Program of China [2013AA01A602]
  3. Program for New Century Excellent Talents in University [NCET-12-0917]
  4. Fundamental Research Funds for the Central Universities [K5051302019]
  5. Doctoral Program of Higher Education of China [20120203120014]

向作者/读者索取更多资源

Cross-modal retrieval has attracted much attention in recent years due to its widespread applications. In this area, how to capture and correlate heterogeneous features originating from different modalities remains a challenge. However, most existing methods dealing with cross-modal learning only focus on learning relevant features shared by two distinct feature spaces, therefore overlooking discriminative feature information of them. To remedy this issue and explicitly capture discriminative feature information, we propose a novel cross-modal retrieval approach based on discriminative dictionary learning that is augmented with common label alignment. Concretely, a discriminative dictionary is first learned to account for each modality, which boosts not only the discriminating capability of intra-modality data from different classes but also the relevance of inter-modality data in the same class. Subsequently, all the resulting sparse codes are simultaneously mapped to a common label space, where the cross-modal data samples are characterized and associated. Also in the label space, the discriminativeness and relevance of the considered cross-modal data can be further strengthened by enforcing a common label alignment. Finally, cross-modal retrieval is performed over the common label space. Experiments conducted on two public cross-modal datasets show that the proposed approach outperforms several state-of-the-art methods in term of retrieval accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: Faster and Better

Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Wei Liu

Summary: Video deraining is an important task in computer vision. In this paper, we propose a new end-to-end video deraining framework called ESTINet, which utilizes deep residual networks and convolutional long short-term memory to enhance the quality and speed of video deraining.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

SCRDet plus plus : Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing

Xue Yang, Junchi Yan, Wenlong Liao, Xiaokang Yang, Jin Tang, Tao He

Summary: In this paper, we introduce the idea of denoising to object detection, enhancing the ability to detect small and cluttered objects. We also address the boundary problem caused by rotation variation by adding an IoU constant factor to the smooth L1 loss. Combining these features, our proposed detector, SCRDet++, shows effectiveness in extensive experiments on various datasets.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Engineering, Electrical & Electronic

Differentiable Neural Architecture Search for Extremely Lightweight Image Super-Resolution

Han Huang, Li Shen, Chaoyang He, Weisheng Dong, Wei Liu

Summary: This paper proposes a novel differentiable Neural Architecture Search (NAS) approach for searching lightweight single image super-resolution models. Experimental results show that the proposed method achieves state-of-the-art performance in terms of PSNR, SSIM, and model complexity.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

Editorial Material Computer Science, Artificial Intelligence

Guest Editorial Robust Learning of Spatio-Temporal Point Processes: Modeling, Algorithm, and Applications

Junchi Yan, Hongteng Xu, Liangda Li, Mehrdad Farajtab, Xiaokang Yang

Summary: This article discusses two common forms of temporal data: synchronized temporal data and asynchronous event data. Previous approaches often convert event data into time series data, but it is more meaningful to directly establish models based on raw event data, especially for time-sensitive tasks. The theme of this article is the development of spatio-temporal point processes and their related applications, which treat an event as a point in the spatio-temporal space and capture the instantaneous happening rate of events and their potential dependency. Use cases include future event prediction and causality estimation.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Engineering, Electrical & Electronic

Unsupervised Cross-Modal Hashing With Modality-Interaction

Rong-Cheng Tu, Jie Jiang, Qinghong Lin, Chengfei Cai, Shangxuan Tian, Hongfa Wang, Wei Liu

Summary: In this paper, the authors propose a novel unsupervised cross-modal hashing method (UCHM) that utilizes a modality-interaction-enabled similarity generator and a bit-selection module to improve the retrieval performance of unlabeled cross-modal data.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

Article Computer Science, Artificial Intelligence

LARNeXt: End-to-End Lie Algebra Residual Network for Face Recognition

Xiaolong Yang, Xiaohong Jia, Dihong Gong, Dong-Ming Yan, Zhifeng Li, Wei Liu

Summary: This paper proposes a completely integrated embedded end-to-end Lie algebra residual architecture (LARNeXt) for achieving pose robust face recognition. By exploring the impact of face rotation on the deep feature generation process of convolutional neural networks (CNNs), the authors design critical subnets to estimate pose efficiently and control the strength of the residual component. Extensive experimental evaluations demonstrate the superiority of their method over state-of-the-art ones.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Robust Mesh Representation Learning via Efficient Local Structure-Aware Anisotropic Convolution

Zhongpai Gao, Junchi Yan, Guangtao Zhai, Juyong Zhang, Xiaokang Yang

Summary: This article introduces a local structure-aware anisotropic convolutional operation (LSA-Conv) for 3-D shapes, which learns adaptive weighting matrices for each template's node and performs shared anisotropic filters. Comprehensive experiments demonstrate that the model achieves significant improvement in 3-D shape reconstruction.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Learning High-Order Graph Convolutional Networks via Adaptive Layerwise Aggregation Combination

Tianqi Zhang, Qitian Wu, Junchi Yan

Summary: In this article, the authors study the reason why high-order graph convolution schemes have the ability to learn structure-aware representations. They propose a new adaptive feature combination method inspired by the squeeze-and-excitation module to effectively capture the interdistance relationship and achieve significant performance gain.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Automation & Control Systems

Machine Learning Methods in Solving the Boolean Satisfiability Problem

Wenxuan Guo, Hui-Ling Zhen, Xijun Li, Wanqian Luo, Mingxuan Yuan, Yaohui Jin, Junchi Yan

Summary: This paper reviews recent literature on using machine learning techniques to solve the Boolean satisfiability problem (SAT), a classic NP-complete problem. The rapid advancements in the field of machine learning over the past decade have inspired many researchers to apply machine learning methods for SAT solving. The paper examines the evolution of ML SAT solvers, from naive classifiers with handcrafted features to emerging end-to-end SAT solvers, as well as recent progress in combining existing conflict-driven clause learning (CDCL) and local search solvers with machine learning methods. The paper concludes by addressing the limitations of current works and suggesting possible future directions. The collected paper list is available at https://github.com/ThinklabSJTU/awesome-ml4co.

MACHINE INTELLIGENCE RESEARCH (2023)

Article Computer Science, Artificial Intelligence

Learning Generative RNN-ODE for Collaborative Time-Series and Event Sequence Forecasting

Longyuan Li, Junchi Yan, Yunhao Zhang, Jihai Zhang, Jie Bao, Yaohui Jin, Xiaokang Yang

Summary: This paper proposes the RNN-ODE collaborative model for joint modeling and forecasting of heterogeneous time-series and event sequence data, combining Bayesian and deep learning techniques for interpretability. Experimental results demonstrate the competitive forecasting performance of both time-series and event sequences compared to existing methods.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Summary: Integrating adaptive learning rate and momentum techniques into stochastic gradient descent has led to various efficient adaptive stochastic algorithms such as AdaGrad, RMSProp, Adam, and AccAdaGrad. This paper proposes a weighted AdaGrad algorithm called AdaUSM that incorporates a unified momentum scheme and a novel weighted adaptive learning rate. It shows that AdaUSM achieves $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the nonconvex stochastic setting with polynomially growing weights. Furthermore, it provides a new perspective for understanding Adam and RMSProp by showing that their adaptive learning rates correspond to exponentially growing weights in AdaUSM. Comparative experiments on deep learning models and datasets are also conducted.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Plug-and-Play Regulators for Image-Text Matching

Haiwen Diao, Ying Zhang, Wei Liu, Xiang Ruan, Huchuan Lu

Summary: Researchers have developed two efficient regulators, namely the Recurrent Correspondence Regulator (RCR) and the Recurrent Aggregation Regulator (RAR), which enhance the flexibility of correspondence and emphasize important alignments in image-text matching. These regulators can be easily incorporated into existing frameworks and have shown significant improvements in various models.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Article Computer Science, Artificial Intelligence

Detecting Rotated Objects as Gaussian Distributions and its 3-D Generalization

Xue Yang, Gefan Zhang, Xiaojiang Yang, Yue Zhou, Wentao Wang, Jin Tang, Tao He, Junchi Yan

Summary: Existing detection methods using parameterized bounding box and rotation angle have limitations for high-precision rotation detection. We propose modeling rotated objects as Gaussian distributions, with regression loss based on Kullback-Leibler Divergence to align detection performance. Our approach resolves boundary discontinuity and square-like problems, and uses an efficient Gaussian metric-based label assignment strategy for improved performance. Analysis of the BBox parameters' gradients shows their interpretable physical meaning, explaining the effectiveness of our approach. Extension to 3D with tailored algorithm design further enhances the performance, as demonstrated on various datasets.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints

Hanchi Huang, Li Shen, Deheng Ye, Wei Liu

Summary: A novel master-slave architecture is proposed to solve the top-K combinatorial multiarmed bandits problem with nonlinear bandit feedback and diversity constraints. It significantly outperforms existing algorithms in recommendation tasks.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Bilateral Relation Distillation for Weakly Supervised Temporal Action Localization

Zhe Xu, Kun Wei, Erkun Yang, Cheng Deng, Wei Liu

Summary: This paper proposes a method called Bilateral Relation Distillation (BRD) to address the problem of weakly supervised temporal action localization. The method learns representations by jointly modeling category-level and sequence-level relations, and captures category-level relations through correlation alignment and category-aware contrast. Additionally, it utilizes a gradient-based feature augmentation method to model relations among segments at the sequence-level.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

暂无数据