4.8 Article

Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2014.2353617

关键词

Unsupervised object discovery; object detection; multiple instance learning; weakly supervised learning; saliency

资金

  1. NSF [IIS-1216528, IIS-1360566, IIS-0844566, IIS-1360568]
  2. ONR [N000140910099]
  3. Microsoft Research Asia
  4. Div Of Information & Intelligent Systems
  5. Direct For Computer & Info Scie & Enginr [1360566] Funding Source: National Science Foundation

向作者/读者索取更多资源

In this paper, we tackle the problem of common object (multiple classes) discovery from a set of input images, where we assume the presence of one object class in each image. This problem is, loosely speaking, unsupervised since we do not know a priori about the object type, location, and scale in each image. We observe that the general task of object class discovery in a fully unsupervised manner is intrinsically ambiguous; here we adopt saliency detection to propose candidate image windows/patches to turn an unsupervised learning problem into a weakly-supervised learning problem. In the paper, we propose an algorithm for simultaneously localizing objects and discovering object classes via bottom-up (saliency-guided) multiple class learning (bMCL). Our contributions are three-fold: (1) we adopt saliency detection to convert unsupervised learning into multiple instance learning, formulated as bottom-up multiple class learning (bMCL); (2) we propose an integrated framework that simultaneously performs object localization, object class discovery, and object detector training; (3) we demonstrate that our framework yields significant improvements over existing methods for multi-class object discovery and possess evident advantages over competing methods in computer vision. In addition, although saliency detection has recently attracted much attention, its practical usage for high-level vision tasks has yet to be justified. Our method validates the usefulness of saliency detection to output noisy input for a top-down method to extract common patterns.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Software Engineering

Neurosymbolic Models for Computer Graphics

Daniel Ritchie, Paul Guerrero, R. Kenny Jones, Niloy J. Mitra, Adriana Schulz, Karl D. D. Willis, Jiajun Wu

Summary: Procedural models have been widely used in computer graphics for representing various visual data, providing interpretability, stochastic variations, high-quality outputs, and compact representation. However, authoring procedural models from scratch is challenging. In recent years, AI-based methods, especially neural networks, have gained popularity for creating graphic content, allowing users to specify desired properties while algorithms take care of the details. However, the ease of use often comes at the cost of interpretability and manipulability.

COMPUTER GRAPHICS FORUM (2023)

Article Robotics

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects

Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager

Summary: We propose a differentiable pipeline for simulating object motion, representing their geometry as a continuous density field parameterized as a deep network. The pipeline estimates object's dynamical properties and introduces a differentiable contact model for computing forces resulting from collisions. This enables robots to autonomously build visually and dynamically accurate object models from still images and motion videos.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Le Xue, Mingfei Gao, Chen Xing, Roberto Martin-Martin, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

Summary: The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. This study introduces ULIP, a framework that utilizes multimodal information to improve the understanding of 3D modality. ULIP is pre-trained with object triplets from image, text, and 3D point cloud and achieves state-of-the-art performance in 3D classification tasks.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR (2023)

Proceedings Paper Automation & Control Systems

STAP: Sequencing Task-Agnostic Policies

Christopher Agia, Toki Migimatsu, Jiajun Wu, Jeannette Bohg

Summary: Advances in robotic skill acquisition have enabled the construction of general-purpose skill libraries. However, executing these skills without considering dependencies between actions may lead to failure in long-term plans. This paper presents a framework, STAP, for training manipulation skills and coordinating their geometric dependencies to solve long-horizon tasks. Experimental results show that this framework promotes long-horizon task success and is applicable in task and motion planning.

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023) (2023)

Proceedings Paper Automation & Control Systems

RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

Haochen Shi, Huazhe Xu, Zhiao Huang, Yunzhu Li, Jiajun Wu

Summary: Modeling and manipulating elasto-plastic objects are crucial for robots to perform complex tasks, and our proposed particle-based representation and model-based planning framework enable the robot to learn dynamics and synthesize control signals for manipulating these objects.

ROBOTICS: SCIENCE AND SYSTEM XVIII (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Scene Synthesis from Human Motion

Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu

Summary: In this paper, a new method called SUMMON is proposed to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Experimental results demonstrate that this method has the potential to generate extensive human-scene interaction data for the community.

PROCEEDINGS SIGGRAPH ASIA 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Video Extrapolation in Space and Time

Yunzhi Zhang, Jiajun Wu

Summary: This study proposes a method that leverages complementary signals between novel view synthesis and video prediction tasks to solve the problem of video extrapolation. Experimental results show that the method performs well on real-world datasets, outperforming several state-of-the-art methods.

COMPUTER VISION - ECCV 2022, PT XVI (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Translating a Visual LEGO Manual to a Machine-Executable Plan

Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu

Summary: This research focuses on translating image-based, step-by-step assembly manuals created by human designers into machine-interpretable instructions. They propose a learning-based framework that utilizes neural networks and projection algorithms to achieve high-precision prediction and strong generalization to unseen components, and outperforms existing methods on multiple datasets.

COMPUTER VISION, ECCV 2022, PT XXXVII (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Unsupervised Segmentation in Real-World Images via Spelke Object Inference

Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear

Summary: This study demonstrates how to learn static grouping priors from motion self-supervision, and introduces a novel segmentation network, EISEN, which achieves significant improvement in self-supervised image segmentation.

COMPUTER VISION, ECCV 2022, PT XXIX (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Rotationally Equivariant 3D Object Detection

Hong-Xing Yu, Jiajun Wu, Li Yi

Summary: This study focuses on the object detection problem in 3D scenes and introduces a new property called object-level rotation equivariance. The Equivariant Object Detection Network (EON) is proposed as a solution to incorporate this property into existing point cloud object detectors, achieving significant improvements in indoor scene and autonomous driving datasets.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Programmatic Concept Learning for Human Motion Description and Synthesis

Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

Summary: This paper introduces Programmatic Motion Concepts, a hierarchical motion representation that captures both low-level motion and high-level description as motion concepts. It enables human motion description, interactive editing, and controlled synthesis of novel video sequences within a single framework. The authors present an architecture that learns this concept representation in a semi-supervised manner, outperforming established baselines, particularly in the small data regime.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Revisiting the Video in Video-Language Understanding

Shyamal Buch, Cristobal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles

Summary: This paper introduces a new video-language analysis model called atemporal probe (ATP), which can better constrain the accuracy of multimodal models and improve the performance of video tasks. The study found that understanding event temporality is often not necessary to achieve strong or state-of-the-art performance. In addition, ATP can also improve video-language dataset and model design.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

Summary: OBJECTFOLDER 2.0 is a large-scale multisensory dataset that provides implicit neural representations of common household objects. Its improvements in data quantity, rendering quality, and model transfer make it an excellent testbed for multisensory learning.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Learning Temporal Dynamics from Cycles in Narrated Video

Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun

Summary: This paper introduces a self-supervised approach to modeling how the world changes as time elapses in computer vision, which learns modality-agnostic predictive functions by jointly solving a multi-modal temporal cycle consistency objective in vision and language. This method outperforms existing self-supervised video prediction methods on various downstream tasks.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Neural Radiance Flow for 4D View Synthesis and Video Processing

Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

Summary: The method utilizes Neural Radiance Flow (NeRFlow) to learn a 4D spatial-temporal representation of dynamic scenes from RGB images. By using a neural implicit representation, it captures 3D occupancy, radiance, and dynamics of scenes, enabling multi-view rendering and video processing tasks without additional supervision.

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)

暂无数据