Article
Computer Science, Artificial Intelligence
Lingyun Song, Jun Liu, Mingxuan Sun, Xuequn Shang
Summary: This paper introduces the weakly supervised group mask network (WSGMN), which leverages the relations among regions to generate community instances with context information, robust to object variations. It generates masks for each label group and dynamically selects the most useful community instances' feature information for object recognition. Extensive experiments demonstrate the effectiveness of WSGMN in weakly supervised object detection tasks.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2021)
Article
Geochemistry & Geophysics
Binglu Wang, Yongqiang Zhao, Xuelong Li
Summary: This study proposes a multiple instance graph (MIG) learning framework for weakly supervised object detection (WSOD) in remote sensing images (RSIs). The framework utilizes spatial and appearance graphs to detect high-quality objects and mine possible instances with the same class.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)
Article
Computer Science, Artificial Intelligence
Wei Gao, Fang Wan, Jun Yue, Songcen Xu, Qixiang Ye
Summary: D-MIL introduces discrepantly collaborative modules into MIL, creating complementary solutions for precise object localization through multiple MIL learners. The teachers-students model improves performance by providing rich information and absorbing complementary knowledge from multiple teachers. D-MIL achieves state-of-the-art performance on the challenging MS-COCO object detection benchmark.
PATTERN RECOGNITION
(2022)
Article
Computer Science, Artificial Intelligence
Xuewei Li, Song Yi, Ruixuan Zhang, Xuzhou Fu, Han Jiang, Chenhan Wang, Zhiqiang Liu, Jie Gao, Jian Yu, Mei Yu, Ruiguo Yu
Summary: This paper proposes a dynamic sample weighting strategy (DSW) that improves the performance of weakly supervised object detection (WSOD) by focusing on samples closely covering the object, resulting in more comprehensive detection results.
IMAGE AND VISION COMPUTING
(2022)
Article
Computer Science, Artificial Intelligence
Zhongyan Zhang, Lei Wang, Yang Wang, Luping Zhou, Jianjia Zhang, Fang Chen
Summary: This paper proposes a novel dataset-driven unsupervised object discovery framework, which utilizes deep feature representation and weakly-supervised object detection to discover objects in the image dataset. The proposed framework improves the performance of region-based instance image retrieval.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Chen -Lin Zhang, Yin Li, Jianxin Wu
Summary: This paper proposes a weakly supervised foreground learning (WSFL) task, which greatly improves weakly supervised object localization (WSOL) and detection (WSOD) by providing groundtruth foreground masks. A complete WSFL pipeline with low computational cost is also introduced, which generates pseudo boxes, learns foreground masks, and does not require any localization annotations. With the help of foreground masks predicted by the WSFL model, state-of-the-art performance is achieved on CUB dataset with 74.37% correct localization accuracy for WSOL, and on VOC07 dataset with 55.7% mean average precision for WSOD. The WSFL model also demonstrates excellent transferability.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Artificial Intelligence
Lin Sui, Chen-Lin Zhang, Jianxin Wu
Summary: Weakly supervised vision tasks, such as detection and segmentation, have received significant attention recently. However, the lack of detailed and precise annotations in weakly supervised scenarios results in a large accuracy gap compared to fully supervised methods. In this article, we propose a new framework called Salvage of Supervision (SoS) to effectively utilize all potentially useful supervisory signals in weakly supervised vision tasks. By applying SoS-WSOD to weakly supervised object detection, we achieve a significant reduction in the technology gap and overcome the limitations of traditional methods.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Qixiang Ye, Fang Wan, Chang Liu, Qingming Huang, Xiangyang Ji
Summary: This article introduces a new WSOD method C-MIL, which systematically alleviates the non-convexity problem by integrating continuation optimization into MIL. By partitioning instances into different subsets and approximating the objective function, C-MIL prevents premature convergence to local minima with smoothed loss functions, optimizing instance selection tasks effectively. The extensive experiments demonstrate the superiority of C-MIL over conventional MIL methods in object detection.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Yan Liu, Yunzhou Zhang, Zhenyu Wang, Rong Ma, Feng Qiu, Sonya Coleman, Dermot Kerr
Summary: This study proposes a novel approach for salient object detection (SOD) that utilizes joint weakly supervised, unsupervised, and supervised learning. It includes an unsupervised learning module to generate coarse saliency features, a weakly supervised learning module based on scribbles for accurate saliency detection, and a supervised learning module to refine and enhance the detected saliency maps. Experimental results show that the proposed approach outperforms state-of-the-art methods and achieves real-time performance.
NEURAL COMPUTING & APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Dingwen Zhang, Wenyuan Zeng, Jieru Yao, Junwei Han
Summary: Weakly supervised object detection has received great attention in recent years in the computer vision community. However, existing approaches mostly focus on visual appearance and ignore the use of context information. This paper proposes a weakly supervised learning framework that incorporates proposal-level and semantic-level context, leading to improved learning performance through deep multiple instance reasoning. Experimental results demonstrate the superior performance of the proposed approach on widely used benchmarks.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Zhida Ren, Yongqiang Tang, Wensheng Zhang
Summary: Weakly supervised object detection (WSOD) is proposed to reduce the labor and capital cost by utilizing only image-level annotations. The absence of instance-level annotations leads to partial regions and missing instances, which are caused by noisy instances in training samples and the lack of global salient information. To solve these issues, an instance dual-optimization framework called IDO is proposed, which includes an instance-wise selection strategy based on curriculum learning and CAM-generated spatial attention. Experimental results show that the proposed method achieves comparable results to other state-of-the-art methods.
APPLIED INTELLIGENCE
(2023)
Article
Multidisciplinary Sciences
Xinyu Gu, Qian Zhang, Zheng Lu
Summary: This paper proposes a weakly supervised object detection method that utilizes contextual information to improve object localization accuracy. By mining context proposals and using a Symmetry Context Module, the proposed method outperforms state-of-the-art methods in terms of mean Average Precision (mAP).
Article
Computer Science, Information Systems
Shiyi Xing, Jinsheng Xing, Jianguo Ju, Qingshan Hou, Jiao She, Bosheng Liu
Summary: With the emergence of cyborgs, the construction of smart cities is greatly influenced, particularly in smart traffic and video surveillance. Object detection, a key technology for cyborgs, can provide localization and classification information. However, both fully supervised and weakly supervised object detection models have limitations. To address this, a coordinate attention mechanism is proposed in this paper to alleviate the tendency of models to focus on local objects and improve detection accuracy. Experimental results on public datasets demonstrate the effectiveness of the proposed model.
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES
(2023)
Article
Computer Science, Artificial Intelligence
Yunqiu Xu, Chunluan Zhou, Xin Yu, Bin Xiao, Yi Yang
Summary: The proposed pyramidal multiple instance detection network (P-MIDN) addresses the issue of local optima often encountered in multiple instance detection networks. By incorporating multiple MIDNs in a sequence and using proposal removal during training to reduce exposure to local discriminative regions, the P-MIDN enables better coverage of target objects. The combination of P-MIDN with an online instance classifier refinement framework and a mask guided self-correction method results in state-of-the-art performance on various benchmark datasets.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2021)
Article
Computer Science, Artificial Intelligence
Danpei Zhao, Zhichao Yuan, Zhenwei Shi, Fengying Xie
Summary: This study introduces a single-shot weakly-supervised object detection model (SSWOD) guided by an empirical saliency model, which uses saliency maps to improve detection efficiency and accuracy, achieving one-step detection without region proposals and reducing computational consumption.
Article
Computer Science, Software Engineering
Daniel Ritchie, Paul Guerrero, R. Kenny Jones, Niloy J. Mitra, Adriana Schulz, Karl D. D. Willis, Jiajun Wu
Summary: Procedural models have been widely used in computer graphics for representing various visual data, providing interpretability, stochastic variations, high-quality outputs, and compact representation. However, authoring procedural models from scratch is challenging. In recent years, AI-based methods, especially neural networks, have gained popularity for creating graphic content, allowing users to specify desired properties while algorithms take care of the details. However, the ease of use often comes at the cost of interpretability and manipulability.
COMPUTER GRAPHICS FORUM
(2023)
Article
Robotics
Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager
Summary: We propose a differentiable pipeline for simulating object motion, representing their geometry as a continuous density field parameterized as a deep network. The pipeline estimates object's dynamical properties and introduces a differentiable contact model for computing forces resulting from collisions. This enables robots to autonomously build visually and dynamically accurate object models from still images and motion videos.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Le Xue, Mingfei Gao, Chen Xing, Roberto Martin-Martin, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese
Summary: The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. This study introduces ULIP, a framework that utilizes multimodal information to improve the understanding of 3D modality. ULIP is pre-trained with object triplets from image, text, and 3D point cloud and achieves state-of-the-art performance in 3D classification tasks.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR
(2023)
Proceedings Paper
Automation & Control Systems
Christopher Agia, Toki Migimatsu, Jiajun Wu, Jeannette Bohg
Summary: Advances in robotic skill acquisition have enabled the construction of general-purpose skill libraries. However, executing these skills without considering dependencies between actions may lead to failure in long-term plans. This paper presents a framework, STAP, for training manipulation skills and coordinating their geometric dependencies to solve long-horizon tasks. Experimental results show that this framework promotes long-horizon task success and is applicable in task and motion planning.
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023)
(2023)
Proceedings Paper
Automation & Control Systems
Haochen Shi, Huazhe Xu, Zhiao Huang, Yunzhu Li, Jiajun Wu
Summary: Modeling and manipulating elasto-plastic objects are crucial for robots to perform complex tasks, and our proposed particle-based representation and model-based planning framework enable the robot to learn dynamics and synthesize control signals for manipulating these objects.
ROBOTICS: SCIENCE AND SYSTEM XVIII
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu
Summary: In this paper, a new method called SUMMON is proposed to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Experimental results demonstrate that this method has the potential to generate extensive human-scene interaction data for the community.
PROCEEDINGS SIGGRAPH ASIA 2022
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Yunzhi Zhang, Jiajun Wu
Summary: This study proposes a method that leverages complementary signals between novel view synthesis and video prediction tasks to solve the problem of video extrapolation. Experimental results show that the method performs well on real-world datasets, outperforming several state-of-the-art methods.
COMPUTER VISION - ECCV 2022, PT XVI
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
Summary: This research focuses on translating image-based, step-by-step assembly manuals created by human designers into machine-interpretable instructions. They propose a learning-based framework that utilizes neural networks and projection algorithms to achieve high-precision prediction and strong generalization to unseen components, and outperforms existing methods on multiple datasets.
COMPUTER VISION, ECCV 2022, PT XXXVII
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear
Summary: This study demonstrates how to learn static grouping priors from motion self-supervision, and introduces a novel segmentation network, EISEN, which achieves significant improvement in self-supervised image segmentation.
COMPUTER VISION, ECCV 2022, PT XXIX
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Hong-Xing Yu, Jiajun Wu, Li Yi
Summary: This study focuses on the object detection problem in 3D scenes and introduces a new property called object-level rotation equivariance. The Equivariant Object Detection Network (EON) is proposed as a solution to incorporate this property into existing point cloud object detectors, achieving significant improvements in indoor scene and autonomous driving datasets.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
Summary: This paper introduces Programmatic Motion Concepts, a hierarchical motion representation that captures both low-level motion and high-level description as motion concepts. It enables human motion description, interactive editing, and controlled synthesis of novel video sequences within a single framework. The authors present an architecture that learns this concept representation in a semi-supervised manner, outperforming established baselines, particularly in the small data regime.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Shyamal Buch, Cristobal Eyzaguirre, Adrien Gaidon, Jiajun Wu, Li Fei-Fei, Juan Carlos Niebles
Summary: This paper introduces a new video-language analysis model called atemporal probe (ATP), which can better constrain the accuracy of multimodal models and improve the performance of video tasks. The study found that understanding event temporality is often not necessary to achieve strong or state-of-the-art performance. In addition, ATP can also improve video-language dataset and model design.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu
Summary: OBJECTFOLDER 2.0 is a large-scale multisensory dataset that provides implicit neural representations of common household objects. Its improvements in data quantity, rendering quality, and model transfer make it an excellent testbed for multisensory learning.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun
Summary: This paper introduces a self-supervised approach to modeling how the world changes as time elapses in computer vision, which learns modality-agnostic predictive functions by jointly solving a multi-modal temporal cycle consistency objective in vision and language. This method outperforms existing self-supervised video prediction methods on various downstream tasks.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu
Summary: The method utilizes Neural Radiance Flow (NeRFlow) to learn a 4D spatial-temporal representation of dynamic scenes from RGB images. By using a neural implicit representation, it captures 3D occupancy, radiance, and dynamics of scenes, enabling multi-view rendering and video processing tasks without additional supervision.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)