☆ 4.7 Article

The true sample complexity of active learning

MACHINE LEARNING (2010)

Journal

MACHINE LEARNING

Volume 80, Issue 2-3, Pages 111-139

Publisher

SPRINGER

DOI: 10.1007/s10994-010-5174-y

Keywords

Active learning; Sample complexity; Selective sampling; Sequential design; Learning theory; Classification

Categories

Computer Science, Artificial Intelligence

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we show that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts with the traditional analysis of active learning problems such as non-homogeneous linear separators or depth-limited decision trees, in which Omega(1/epsilon) lower bounds are common. Such lower bounds should be interpreted carefully; indeed, we prove that it is always possible to learn an epsilon-good classifier with a number of samples asymptotically smaller than this. These new insights arise from a subtle variation on the traditional definition of sample complexity, not previously recognized in the active learning literature.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Engineering, Electrical & Electronic

Active Sampling of Multiple Sources for Sequential Estimation

Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das

Summary: This paper focuses on a sequential estimation approach for estimating shared and private parameters of K processes. The proposed active sampling algorithm makes data-driven sampling decisions and provides estimators for the parameters, achieving reliable estimates with the fewest number of samples.

IEEE TRANSACTIONS ON SIGNAL PROCESSING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Information theoretic perspective on sample complexity

Deborah Pereg

Summary: The statistical supervised learning framework assumes a joint probability distribution that can be accurately represented by the training dataset. This work investigates the relationship between sample complexity, empirical risk, and generalization error based on the asymptotic equipartition property. The study provides theoretical guarantees for reliable learning in different settings regarding generalization error and sample size.

NEURAL NETWORKS (2023)

Add to Collection

Article Automation & Control Systems

On the Complexity of Sequential Incentive Design

Yagiz Savas, Vijay Gupta, Ufuk Topcu

Summary: This article discusses the problem of synthesizing incentives to induce desired agent behavior when the agent's intrinsic motivation is unknown. The agent's behavior is modeled as a Markov decision process, and linear programming is used to solve the problem.

IEEE TRANSACTIONS ON AUTOMATIC CONTROL (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Disagreement-Based Active Learning in Online Settings

Boshuang Huang, Sudeep Salgia, Qing Zhao

Summary: This article studies online active learning for classifying streaming instances within the framework of statistical learning theory. By developing a disagreement-based online learning algorithm and establishing the tradeoff between label complexity and regret, an optimized algorithm is proposed.

IEEE TRANSACTIONS ON SIGNAL PROCESSING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Switching: understanding the class-reversed sampling in tail sample memorization

Chi Zhang, Benyi Hu, Yuhang Liuzhang, Le Wang, Li Liu, Yuehu Liu

Summary: Long-tailed visual recognition poses challenges to traditional machine learning and deep networks. Existing methods lack theory and fail to solve the paradoxical effects of long tail. This paper proposes a principled solution and a sampling strategy called Switching, which achieves more efficient performance in long-tailed learning.

MACHINE LEARNING (2022)

Add to Collection

Article Computer Science, Information Systems

Identifying the Acoustic Source via MFF-ResNet with Low Sample Complexity

Min Cui, Yang Liu, Yanbo Wang, Pan Wang

Summary: Acoustic signal classification is crucial for acoustic source identification, but limited training data often leads to low sample complexity. This study proposes a data fusion model, MFF-ResNet, that combines manual design features and deep representation of log-Mel spectrogram features, along with prior human knowledge as implicit regularization, resulting in a low sample complexity model for accurate acoustic signal classification.

ELECTRONICS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Active learning for new-fault class sample recovery in electrical submersible pump fault diagnosis

Luciano Henrique Peixoto da Silva, Lucas Henrique Sousa Mello, Alexandre Rodrigues, Flavio Miguel Varejao, Marcos Pellegrini Ribeiro, Thiago Oliveira-Santos

Summary: This paper proposes an intelligent fault diagnosis method for Electrical Submersible Pump using uncertainty-based active learning to assist experts in labeling data and searching for samples of new fault types. By analyzing features from vibration signals, the proposed approach tests classical classification algorithms and deep learning methods, and introduces a new acquisition strategy for active learning to improve the performance of feature extractors.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Statistics & Probability

FINITE-SAMPLE COMPLEXITY OF SEQUENTIAL MONTE CARLO ESTIMATORS

Joe Marion, Joseph Mathews, Scott C. Schmidler

Summary: We present bounds for the finite-sample error of sequential Monte Carlo samplers on static spaces. Our approach explicitly relates the performance of the algorithm to properties of the chosen sequence of distributions and mixing properties of the associated Markov kernels. This allows us to give the first finite-sample comparison to other Monte Carlo schemes. We obtain bounds for the complexity of sequential Monte Carlo approximations for a variety of target distributions such as finite spaces, product measures, and log-concave distributions including Bayesian logistic regression. The bounds obtained are within a logarithmic factor of similar bounds obtainable for Markov chain Monte Carlo.

ANNALS OF STATISTICS (2023)

Add to Collection

Article Engineering, Civil

Active learning-based structural reliability evaluation Kriging model and sequential importance sampling

Chengning Zhou, Lingjie Wang, Yuqi Chen

Summary: This paper proposes a novel method, PAK-SEIS, which combines the Parallel Active learning Kriging model and the Sequential Importance Sampling method to efficiently analyze the structural system reliability with multiple failure modes and small failure probabilities. The method includes a new sequential importance sampling method that integrates sequential Monte Carlo simulation and kernel density estimation. The proposed parallel learning strategy allows for the selection of multiple new training samples and reduces the iterations of Kriging models.

STRUCTURES (2023)

Add to Collection

Article Mathematics, Applied

SEQUENTIAL ACTIVE LEARNING OF LOW-DIMENSIONAL MODEL REPRESENTATIONS FOR RELIABILITY ANALYSIS

Max Ehre, Iason Papaioannou, Bruno Sudret, Daniel Straub

Summary: This study addresses the challenge of analyzing high-dimensional, computationally expensive engineering models in risk and reliability engineering using a combination of dimensionality reduction and surrogate modeling. The approach is extended with an active learning procedure to improve error control. The performance of this approach is demonstrated with various example problems featuring well-known caveats for reliability methods.

SIAM JOURNAL ON SCIENTIFIC COMPUTING (2022)

Add to Collection

Article Mathematics, Applied

SEQUENTIAL ACTIVE LEARNING OF LOW-DIMENSIONAL MODEL REPRESENTATIONS FOR RELIABILITY ANALYSIS

Max Ehre, Iason Papaioannou, Bruno Sudret, Daniel Straub

Summary: This paper presents a method combining dimensionality reduction and surrogate modeling to address the analysis of high-dimensional, computationally expensive engineering models. Through an active learning procedure, improved error control can be achieved at each importance sampling level.

SIAM JOURNAL ON SCIENTIFIC COMPUTING (2022)

Add to Collection

Article Computer Science, Information Systems

Uncertainty-Based Selective Clustering for Active Learning

Sekjin Hwang, Jinwoo Choi, Joonsoo Choi

Summary: The paper introduces the uncertainty-based Selective Clustering for Active Learning (SCAL) method, which selectively clusters data with high uncertainty to reduce redundancy, thus extending the area of the decision boundary represented by the sampled data. SCAL achieves cutting-edge performance for classification tasks on balanced and unbalanced image datasets as well as semantic segmentation tasks.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Information Systems

An overlapping oriented imbalanced ensemble learning algorithm with weighted projection clustering grouping and consistent fuzzy sample transformation

Fan Li, Bo Wang, Yinghua Shen, Pin Wang, Yongming Li

Summary: This paper proposes an imbalanced ensemble learning algorithm based on weighted projection clustering grouping and consistent fuzzy sample transformation. It utilizes a weighted projection clustering combination framework to obtain high-quality clusters and applies a stage-wise hybrid sampling algorithm for de-overlapping and balancing of subsets. Additionally, a local-global structure consistency mechanism is constructed to improve the quality of samples in subsets. Experimental results demonstrate the superiority of the proposed algorithm in terms of anti-overlapping, Recall, F1-M, G-M, AUC, and diversity.

INFORMATION SCIENCES (2023)

Add to Collection

Article Computer Science, Information Systems

Noise Avoidance SMOTE in Ensemble Learning for Imbalanced Data

Kyoungok Kim

Summary: The study introduced new hybrid sampling/ensemble algorithms, NASBoost and NASBagging, based on a modification of SMOTE, which improved classification performance by preventing the generation of noise in the minority class while maintaining diversity among training sets.

IEEE ACCESS (2021)

Add to Collection

Article Engineering, Multidisciplinary

GAN-Based Dual Active Learning for Nosocomial Infection Detection

Li Wang, Xin Ye, Jialin Li, Yu Wen, Wenbin Liao, Houbing Song, Jie Chen, Jianqiang Li

Summary: In this paper, an architecture utilizing generative adversarial networks and dual active learning modules was proposed to address the issues of imbalanced and scarce data in hospital acquired infections detection. The results showed that this approach improved accuracy and F1-score, demonstrating its effectiveness and efficiency.

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING (2022)

Add to Collection

No Data Available

No Data Available

© Peeref 2019-2024. All rights reserved.