Article
Computer Science, Artificial Intelligence
Haigang Zhang, Xianglong Meng, Weipeng Cao, Ye Liu, Zhong Ming, Jinfeng Yang
Summary: Multi-label Zero-shot Learning (ZSL) is a more reasonable and realistic approach than standard single-label ZSL as it considers the coexistence of multiple objects in real-life images. Intra-class feature entanglement affects the alignment of visual and semantic features, making it difficult for the model to recognize unseen samples comprehensively. Existing multi-label ZSL methods focus on attention-based refinement and decoupling of visual features, but overlook the relationship between label semantics. This paper proposes a method that utilizes label correlations and builds a weighted semantic graph to guide visual feature extraction, achieving improved performance compared to state-of-the-art models.
Article
Computer Science, Artificial Intelligence
Yu Yun, Sen Wang, Mingzhen Hou, Quanxue Gao
Summary: In zero-shot learning, algorithms utilize semantic knowledge to establish the connection between visual space and semantic space in order to recognize unseen classes. However, the original semantic representation often fails to accurately capture both class-specificity and discriminative information, resulting in misclassification of unseen classes. To address this issue, we propose a Salient Attributes Learning Network (SALN) that generates discriminative semantic representations supervised by visual features. Additionally, we employ an l(1,2)-norm constraint to ensure the learned semantic representations effectively capture class-specificity and discriminative information in the dimension space. Our approach achieves promising performance on benchmark datasets and extensive experiments demonstrate its effectiveness and excellence.
Article
Computer Science, Artificial Intelligence
Damares Crystina Oliveira de Resende, Moacir Antonelli Ponti
Summary: This paper investigates visual-semantic representations by combining visual features and semantic attributes. The method shows robustness for up to 20% degradation of semantic attributes and allows for zero-shot learning, enabling effective classification of unseen data.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Automation & Control Systems
Mengyao Lyu, Hu Han, Xiangzhi Bai
Summary: The goal of zero-shot learning is to transfer knowledge from seen to unseen classes by using auxiliary information. Most existing methods view ZSL as a label-embedding problem and face challenges such as bias towards seen classes and sacrificing performance. In this article, an embedding approach inspired by human recognition memory is proposed to effectively address these issues and outperform state-of-the-art methods.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
(2022)
Article
Chemistry, Multidisciplinary
Youngki Park, Youhyun Shin
Summary: This paper introduces an efficient approach to multi-label image classification that integrates object detection and embedding techniques. The method is capable of quickly and accurately classifying novel classes with minimal training data. Empirical evaluation demonstrates that the approach outperforms traditional methods and unsupervised object detection is faster and more accurate than supervised object detection.
APPLIED SCIENCES-BASEL
(2023)
Article
Computer Science, Artificial Intelligence
Yuanqi Chen, Xiaoming Yu, Shan Liu, Wei Gao, Ge Li
Summary: Proposed a zero-shot unsupervised image-to-image translation framework by associating categories with attributes, addressing the mode collapse issue in existing methods when there is a lack of target class images. By preserving semantic relations and expanding attribute space, the translator is encouraged to explore the modes of unseen classes, achieving translation for previously unseen classes.
IMAGE AND VISION COMPUTING
(2022)
Article
Computer Science, Artificial Intelligence
Ziyi Chen, Yutong Gao, Congyan Lang, Lili Wei, Yidong Li, Hongzhe Liu, Fayao Liu
Summary: Zero-shot learning aims to predict unseen categories without collecting training data. Existing works focus on constructing semantic topological knowledge with textual descriptions, but suffer from difficulties in accurately describing visual characters and enumerating all hidden attributes. To address these issues, a Cross-Modality Topology Propagation Matcher (CTPM) is proposed to construct a more complete topology system in both visual and semantic modalities. CTPM achieves state-of-the-art performance on four ZSL datasets. & COPY; 2023 Elsevier Ltd. All rights reserved.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Artificial Intelligence
Guangfeng Lin, Caixia Fan, Wanjun Chen, Yajun Chen, Fan Zhao
Summary: The paper introduces a novel zero-shot learning method (CLASR) based on learning class label autoencoder, which can adapt to various semantic embedding spaces and improve zero-shot classification performance.
Article
Chemistry, Analytical
Jingyi Liu, Caijuan Shi, Dongjing Tu, Ze Shi, Yazhi Liu
Summary: This paper proposes a zero-shot image classification method based on deep learning, which reduces the dependence on labeled training samples by using common space embedding and end-to-end learnable deep metric to learn the similarity of visual features and semantic features.
Article
Environmental Sciences
Chen Ding, Yu Li, Yue Wen, Mengmeng Zheng, Lei Zhang, Wei Wei, Yanning Zhang
Summary: A novel few-shot deep learning framework for hyperspectral image classification is proposed to address the scenario where only a very limited number of labeled samples are available, achieving better performance compared to existing competitors in few-shot and one-shot settings.
Article
Computer Science, Artificial Intelligence
Cheng Yang, Weijia Wu, Yuxing Wang, Hong Zhou
Summary: The paper presents a novel feature-based ZSD model that constructs visual features leveraging the deep feature embedding of the detector, and simulates human-defined attributes for specific label embedding to improve detection performance on unseen classes. Extensive experiments show significant improvement on performance, surpassing existing methods on the challenging COCO dataset.
APPLIED INTELLIGENCE
(2021)
Article
Computer Science, Artificial Intelligence
Zhong Ji, Biying Cui, Yunlong Yu, Yanwei Pang, Zhongfei Zhang
Summary: This paper proposes a zero-shot learning method based on the Unseen Prototype Learning (UPL) model. The method addresses the classification bias problem from unseen to seen categories by learning visual prototypes and using semantic information as constraints. By accumulating experiences in meta-learning, it is also able to predict unseen classes effectively.
NEURAL COMPUTING & APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Qin Li, Mingzhen Hou, Hong Lai, Ming Yang
Summary: In generalized zero-shot learning, the problem of misclassification of unseen classes is addressed by using a semantic embedding network and a distribution alignment constraint to improve the accuracy of classification.
Article
Engineering, Electrical & Electronic
Shuang Li, Lichun Wang, Shaofan Wang, Dehui Kong, Baocai Yin
Summary: Zero-shot learning is a method for recognizing images of novel classes without using any images belonging to the novel classes during training. It achieves this by exploiting auxiliary semantic information. Recent ZSL methods have focused on learning visual-semantic embeddings to transfer knowledge from seen classes to novel classes. However, the difference in granularity between image features and class attributes makes it difficult to match the two. To address this, we propose a hierarchical coupled discriminative dictionary learning method to hierarchically establish visual-semantic embedding at the class-level and image-level. A coarse-to-fine approach is used to build basic and coarse-grained connections and then fine-grained connections between visual space and semantic space.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Artificial Intelligence
Jingqi Yang, Qi Shen, Cheng Xie
Summary: Generalized zero-shot learning is an important research area in visual computing tasks. Existing methods often use transferable semantic features to predict unseen classes without training the unseen samples. However, they encounter the problem of semantic-visual inconsistency. To handle this, we propose a generation-based contrastive model with two alignment modules: the Feedback Alignment Module and Negative sample Alignment Module.
IMAGE AND VISION COMPUTING
(2023)
Article
Computer Science, Artificial Intelligence
Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata
Summary: The study proposes a novel representation learning framework for improving the accuracy of zero-shot and few-shot image classification tasks through integrating global and local features, and combining attribute localization ability. The model is able to localize the attributes in an image and is evaluated through various methods.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2022)
Article
Computer Science, Artificial Intelligence
Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt Schiele, Zeynep Akata
Summary: This article introduces the research progress of few-shot learning in the field of video classification and points out that existing methods underestimate the importance of video feature learning. The authors propose a two-stage approach and further improve the performance by using tag retrieval and generative adversarial networks. In addition, the authors propose more realistic benchmarks for evaluation, and experimental results show that the new methods perform better on the new benchmarks.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Ushasi Chaudhuri, Ruchika Chavan, Biplab Banerjee, Anjan Dutta, Zeynep Akata
Summary: The efficacy of zero-shot sketch-based image retrieval models is challenged by domain alignment and feature mapping. This study proposes a novel framework that performs bi-level domain adaptation to align the spatial and semantic features of visual data pairs progressively. Experimental results demonstrate significant improvements.
Proceedings Paper
Computer Science, Artificial Intelligence
Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata
Summary: Audio-visual generalised zero-shot learning for video classification is important for recognizing samples from novel classes. This study proposes a multi-modal and temporal cross-attention framework (TCaF) that aligns audio and visual features in time and focuses on cross-modal correspondence, achieving state-of-the-art performance.
COMPUTER VISION, ECCV 2022, PT XX
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Michael Kirchhof, Karsten Roth, Zeynep Akata, Enkelejda Kasneci
Summary: Proxy-based Deep Metric Learning (DML) learns deep representations by embedding images close to their class representatives (proxies), commonly with respect to the angle between them. However, this disregards the embedding norm and struggles to learn class-internal structures. To address these issues, researchers propose a non-isotropic probabilistic proxy-based DML that models images as directional von Mises-Fisher (vMF) distributions to capture image-intrinsic uncertainties and derives non-isotropic von Mises-Fisher (nivMF) distributions for class proxies to represent complex class-specific variances. Multiple distribution-to-point and distribution-to-distribution metrics are developed to measure the proxy-to-image distance. Ablational studies demonstrate the benefits of the probabilistic approach in terms of uncertainty-awareness, improved gradients, and overall generalization performance.
COMPUTER VISION, ECCV 2022, PT XXVI
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Karsten Roth, Oriol Vinyals, Zeynep Akata
Summary: Deep Metric Learning aims to learn embedding spaces that encode semantic similarities, but current methods ignore higher-level semantic relations between classes. To address this, we propose a language guidance objective for visual similarity learning, offering better semantic consistency in the learned embedding spaces.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Karsten Roth, Oriol Vinyals, Zeynep Akata
Summary: Deep Metric Learning aims to learn representation spaces where semantic relations can be expressed through predefined distance metrics. This paper proposes a non-isotropy regularization method based on proxies, which guides the samples to form non-isotropic distributions around class proxies to better learn local structures.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata
Summary: Human-annotated attributes can be powerful semantic embeddings in zero-shot learning, but the annotation process is labor-intensive and requires expert supervision. Current unsupervised semantic embeddings, such as word embeddings, can transfer knowledge between classes but may not reflect visual similarities well, resulting in inferior zero-shot performance. We propose a method that discovers semantic embeddings containing discriminative visual properties without any human annotation, improving the performance of zero-shot learning.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
Summary: This paper investigates the problems of open-world compositional zero-shot learning (OW-CZSL) and CZSL under partial supervision (pCZSL). It proposes a novel model KG-SP, which predicts the primitives independently and estimates the feasibility using prior knowledge. The model achieves state of the art performance in both OW-CZSL and pCZSL tasks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Youngwook Kim, Jae Myung Kim, Zeynep Akata, Jungwoo Lee
Summary: In this paper, we propose a new method for weakly supervised multi-label classification, where unobserved labels are considered as negative labels and the model is prevented from memorizing noisy labels using the memorization effect. Experimental results show that our method outperforms previous methods on multiple datasets.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata
Summary: In this study, the focus was on audio-visual zero-shot learning, with the proposal of using cross-modal attention and textual label embeddings to achieve good results.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Theory & Methods
Andrei Neculai, Yanbei Chen, Zeynep Akata
Summary: Existing image retrieval methods often only consider one or two query inputs, which cannot be generalized to multiple queries. In this study, we propose a more challenging scenario for image retrieval, which involves composing multiple multi-modal queries to retrieve target images with specified semantic concepts. Our proposed multimodal probabilistic composer (MPC) learns an informative embedding that can encode the semantics of various queries and facilitate image retrieval with multiple multimodal queries.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022
(2022)
Proceedings Paper
Computer Science, Theory & Methods
Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata
Summary: In this work, the authors propose a method called ACVC that alters training images to simulate new domains and imposes consistent visual attention to improve the generalization of visual recognition models. The proposed method achieves state-of-the-art performance on three single-source domain generalization benchmarks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022
(2022)
Proceedings Paper
Computer Science, Theory & Methods
Stephan Alaniz, Marco Federici, Zeynep Akata
Summary: Learning a common representation space between vision and language helps relate objects in the image to their semantic meaning. By combining the spatial transformer with a representation learning approach, our model is able to split images into separately encoded patches and associate visual and textual representations in an interpretable manner.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022
(2022)
Proceedings Paper
Computer Science, Theory & Methods
Samarth Sinha, Karsten Roth, Anirudh Goyal, Marzyeh Ghassemi, Zeynep Akata, Hugo Larochelle, Animesh Garg
Summary: This paper investigates the ability to promote transfer learning on different tasks and data by learning uniformly distributed features in deep networks. The use of uniformity regularization allows for the learning of more robust vision systems, achieving better performance than baseline models across multiple domains.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022
(2022)