4.8 Article Proceedings Paper

Label-Embedding for Image Classification

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2015.2487986

关键词

Image classification; label embedding; zero-shot learning; attributes

向作者/读者索取更多资源

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as, e.g., class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Attribute Prototype Network for Any-Shot Learning

Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

Summary: The study proposes a novel representation learning framework for improving the accuracy of zero-shot and few-shot image classification tasks through integrating global and local features, and combining attribute localization ability. The model is able to localize the attributes in an image and is evaluated through various methods.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2022)

Article Computer Science, Artificial Intelligence

Generalized Few-Shot Video Classification With Video Retrieval and Feature Generation

Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt Schiele, Zeynep Akata

Summary: This article introduces the research progress of few-shot learning in the field of video classification and points out that existing methods underestimate the importance of video feature learning. The authors propose a two-stage approach and further improve the performance by using tag retrieval and generative adversarial networks. In addition, the authors propose more realistic benchmarks for evaluation, and experimental results show that the new methods perform better on the new benchmarks.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

BDA-SketRet: Bi-level domain adaptation for zero-shot SBIR

Ushasi Chaudhuri, Ruchika Chavan, Biplab Banerjee, Anjan Dutta, Zeynep Akata

Summary: The efficacy of zero-shot sketch-based image retrieval models is challenged by domain alignment and feature mapping. This study proposes a novel framework that performs bi-level domain adaptation to align the spatial and semantic features of visual data pairs progressively. Experimental results demonstrate significant improvements.

NEUROCOMPUTING (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning

Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Summary: Audio-visual generalised zero-shot learning for video classification is important for recognizing samples from novel classes. This study proposes a multi-modal and temporal cross-attention framework (TCaF) that aligns audio and visual features in time and focuses on cross-modal correspondence, achieving state-of-the-art performance.

COMPUTER VISION, ECCV 2022, PT XX (2022)

Proceedings Paper Computer Science, Artificial Intelligence

A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning

Michael Kirchhof, Karsten Roth, Zeynep Akata, Enkelejda Kasneci

Summary: Proxy-based Deep Metric Learning (DML) learns deep representations by embedding images close to their class representatives (proxies), commonly with respect to the angle between them. However, this disregards the embedding norm and struggles to learn class-internal structures. To address these issues, researchers propose a non-isotropic probabilistic proxy-based DML that models images as directional von Mises-Fisher (vMF) distributions to capture image-intrinsic uncertainties and derives non-isotropic von Mises-Fisher (nivMF) distributions for class proxies to represent complex class-specific variances. Multiple distribution-to-point and distribution-to-distribution metrics are developed to measure the proxy-to-image distance. Ablational studies demonstrate the benefits of the probabilistic approach in terms of uncertainty-awareness, improved gradients, and overall generalization performance.

COMPUTER VISION, ECCV 2022, PT XXVI (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Integrating Language Guidance into Vision-based Deep Metric Learning

Karsten Roth, Oriol Vinyals, Zeynep Akata

Summary: Deep Metric Learning aims to learn embedding spaces that encode semantic similarities, but current methods ignore higher-level semantic relations between classes. To address this, we propose a language guidance objective for visual similarity learning, offering better semantic consistency in the learned embedding spaces.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Non-isotropy Regularization for Proxy-based Deep Metric Learning

Karsten Roth, Oriol Vinyals, Zeynep Akata

Summary: Deep Metric Learning aims to learn representation spaces where semantic relations can be expressed through predefined distance metrics. This paper proposes a non-isotropy regularization method based on proxies, which guides the samples to form non-isotropic distributions around class proxies to better learn local structures.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

Summary: Human-annotated attributes can be powerful semantic embeddings in zero-shot learning, but the annotation process is labor-intensive and requires expert supervision. Current unsupervised semantic embeddings, such as word embeddings, can transfer knowledge between classes but may not reflect visual similarities well, resulting in inferior zero-shot performance. We propose a method that discovers semantic embeddings containing discriminative visual properties without any human annotation, improving the performance of zero-shot learning.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

KG-SP: Knowledge Guided Simple Primitives for OpenWorld Compositional Zero-Shot Learning

Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

Summary: This paper investigates the problems of open-world compositional zero-shot learning (OW-CZSL) and CZSL under partial supervision (pCZSL). It proposes a novel model KG-SP, which predicts the primitives independently and estimates the feasibility using prior knowledge. The model achieves state of the art performance in both OW-CZSL and pCZSL tasks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Large Loss Matters in Weakly Supervised Multi-Label Classification

Youngwook Kim, Jae Myung Kim, Zeynep Akata, Jungwoo Lee

Summary: In this paper, we propose a new method for weakly supervised multi-label classification, where unobserved labels are considered as negative labels and the model is prevented from memorizing noisy labels using the memorization effect. Experimental results show that our method outperforms previous methods on multiple datasets.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Summary: In this study, the focus was on audio-visual zero-shot learning, with the proposal of using cross-modal attention and textual label embeddings to achieve good results.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Theory & Methods

Probabilistic Compositional Embeddings for Multimodal Image Retrieval

Andrei Neculai, Yanbei Chen, Zeynep Akata

Summary: Existing image retrieval methods often only consider one or two query inputs, which cannot be generalized to multiple queries. In this study, we propose a more challenging scenario for image retrieval, which involves composing multiple multi-modal queries to retrieve target images with specified semantic concepts. Our proposed multimodal probabilistic composer (MPC) learns an informative embedding that can encode the semantics of various queries and facilitate image retrieval with multiple multimodal queries.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

Proceedings Paper Computer Science, Theory & Methods

Attention Consistency on Visual Corruptions for Single-Source Domain Generalization

Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata

Summary: In this work, the authors propose a method called ACVC that alters training images to simulate new domains and imposes consistent visual attention to improve the generalization of visual recognition models. The proposed method achieves state-of-the-art performance on three single-source domain generalization benchmarks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

Proceedings Paper Computer Science, Theory & Methods

Compositional Mixture Representations for Vision and Text

Stephan Alaniz, Marco Federici, Zeynep Akata

Summary: Learning a common representation space between vision and language helps relate objects in the image to their semantic meaning. By combining the spatial transformer with a representation learning approach, our model is able to split images into separately encoded patches and associate visual and textual representations in an interpretable manner.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

Proceedings Paper Computer Science, Theory & Methods

Uniform Priors for Data-Efficient Learning

Samarth Sinha, Karsten Roth, Anirudh Goyal, Marzyeh Ghassemi, Zeynep Akata, Hugo Larochelle, Animesh Garg

Summary: This paper investigates the ability to promote transfer learning on different tasks and data by learning uniformly distributed features in deep networks. The use of uniformity regularization allows for the learning of more robust vision systems, achieving better performance than baseline models across multiple domains.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

暂无数据