☆ 4.6 Article

ESTIMATING THE ALGORITHMIC VARIANCE OF RANDOMIZED ENSEMBLES VIA THE BOOTSTRAP

ANNALS OF STATISTICS (2019)

Journal

ANNALS OF STATISTICS

Volume 47, Issue 2, Pages 1088-1112

Publisher

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/18-AOS1707

Keywords

Bootstrap; random forests; bagging; randomized algorithms

Categories

Statistics & Probability

Funding

NSF [DMS-1613218]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is large enough-so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of algorithmic variance (i.e., the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable ERRt denote the prediction error of a randomized ensemble of size t. Working under a first-order model for randomized ensembles, we prove that the centered law of ERRt can be consistently approximated via the proposed method as t -> infinity. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of ERRt are negligible.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Evolutionary bagging for ensemble learning

Giang Ngo, Rodney Beard, Rohitash Chandra

Summary: In this paper, an evolutionary bagged ensemble learning method is proposed, which enhances the diversity of bags using evolutionary algorithms. The experimental results show that this method outperforms traditional ensemble learning methods on various benchmark datasets.

NEUROCOMPUTING (2022)

Add to Collection

Article Automation & Control Systems

Random Shapley Forests: Cooperative Game-Based Random Forests With Consistency

Jianyuan Sun, Hui Yu, Guoqiang Zhong, Junyu Dong, Shu Zhang, Hongchuan Yu

Summary: In this article, a new random forests algorithm called random Shapley forests (RSFs) is proposed, which uses the Shapley value to evaluate the importance of each feature. The experiments conducted on benchmark and real-world datasets demonstrate that RSFs outperform or are at least comparable to existing consistent RFs, original RFs, and support vector machines.

IEEE TRANSACTIONS ON CYBERNETICS (2022)

Add to Collection

Article Business, Finance

Examining the volatility of soybean market in the MIDAS framework: The importance of bagging-based weather information

Lu Wang, Rui Wu, WeiChun Ma, Weiju Xu

Summary: Based on the relationship between the global soybean market and weather, this study aims to fill the gap in soybean volatility forecasting under weather information. By using extended GARCH-MIDAS approaches and adding weather variables, we find that models incorporating bagging-based weather information outperform those with raw weather indicators or without weather information. Our conclusions are robust to further tests, and our novel bagging-related GARCH-MIDAS-W-MBB model provides fresh insights into soybean volatility forecasting.

INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS (2023)

Add to Collection

Article Statistics & Probability

Rates of convergence for random forests via generalized U-statistics

Wei Peng, Tim Coleman, Lucas Mentch

Summary: Random forests are a popular off-the-shelf supervised learning algorithm, and this research establishes convergence rates for random forests and other supervised learning ensembles, providing a quantitative measure for the speed of convergence.

ELECTRONIC JOURNAL OF STATISTICS (2022)

Add to Collection

Article Health Care Sciences & Services

A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset

Dong-Hwa Jeong, Se-Eun Kim, Woo-Hyeok Choi, Seong-Ho Ahn

Summary: This study aims to classify physical activities in daily life using machine learning methods. By extracting features and applying sampling methods, the data imbalance issue was successfully addressed. The results showed that methods like random forest and adaptive boosting performed well in PA classification.

HEALTHCARE (2022)

Add to Collection

Article Soil Science

Assessing agricultural salt-affected land using digital soil mapping and hybridized random forests

Kamal Nabiollahi, Ruhollah Taghizadeh-Mehrjardi, Aram Shahabi, Brandon Heung, Alireza Amirian-Chakan, Masoud Davari, Thomas Scholten

Summary: In a study conducted in Kurdistan Province, Iran, a combination of random forests and covariate data was used to assess the spatial variability of salinity and sodicity in agricultural salt-affected land. The results showed that optimization algorithms helped improve the accuracy of predictions, and identified groundwater table, categorical maps, salinity index, and multi-resolution ridge top flatness as important covariates for predicting soil salinity and sodicity.

GEODERMA (2021)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Software for Data-Based Stochastic Programming Using Estimation

Xiaotie Chen, David L. Woodruff

Summary: This software utilizes sampled data to obtain a consistent sample-average solution and estimate confidence intervals for the optimality gap using bootstrap and bagging, without the need for considering the underlying distribution of the samples.

INFORMS JOURNAL ON COMPUTING (2023)

Add to Collection

Article Engineering, Multidisciplinary

EvoSeg: Automated Electron Microscopy Segmentation through Random Forests and Evolutionary Optimization

Manuel Zumbado-Corrales, Juan Esquivel-Rodriguez

Summary: Electron Microscopy Maps are crucial for studying bio-molecular structures, describing envelopes of proteins within cells. Segmentation and Evolutionary-Optimized Segmentation algorithms are used to improve the identification of protein regions, aiding in drug design and functional understanding.

BIOMIMETICS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Minimally overfitted learners: A general framework for ensemble learning

Victor Acena, Isaac Martin de Diego, Ruben R. Fernandez, Javier M. Moguerza

Summary: This study introduces a new ensemble framework called MOE, which effectively combines stable and unstable machine learning algorithms in constructing predictive models. By using resampling techniques and weighted random bootstrap sampling, the framework constructs slightly overfitted base learners, thereby improving the predictive ability.

KNOWLEDGE-BASED SYSTEMS (2022)

Add to Collection

Article Geography, Physical

Ensemble machine learning models based on Reduced Error Pruning Tree for prediction of rainfall-induced landslides

Binh Thai Pham, Abolfazl Jaafari, Trung Nguyen-Thoi, Tran Van Phong, Huu Duy Nguyen, Neelima Satyam, Md Masroor, Sufia Rehman, Haroon Sajjad, Mehebub Sahana, Hiep Van Le, Indra Prakash

Summary: This study developed highly accurate ensemble machine learning models for spatial prediction of rainfall-induced landslides in the Uttarkashi district, India. The D-REPT model was identified as the most accurate, providing insights for engineers and modelers to develop more advanced predictive models.

INTERNATIONAL JOURNAL OF DIGITAL EARTH (2021)

Add to Collection

Article Automation & Control Systems

Random vector functional link forests and extreme learning forests applied to UAV automatic target recognition

Victor Henrique Alves Ribeiro, Roberto Santana, Gilberto Reynoso-Meza

Summary: This paper proposes two novel machine learning algorithms to improve the automatic target recognition system for unmanned aerial vehicles. These models make use of the stochastic procedure of Random Forests and employ the novel Random Vector Functional Link Tree or Extreme Learning Tree for decision split. Experimental results show that the proposed algorithms outperform other state-of-the-art ensemble learning techniques in terms of predictive performance and computational complexity.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2023)

Add to Collection

Article Engineering, Environmental

Landslide susceptibility mapping using an ensemble model of Bagging scheme and random subspace-based naive Bayes tree in Zigui County of the Three Gorges Reservoir Area, China

Xudong Hu, Cheng Huang, Hongbo Mei, Han Zhang

Summary: A novel machine learning ensemble model, BRSNBtree, was proposed to predict landslide susceptibility in Zigui County of the Three Gorges Reservoir Area. The results showed that the distance to rivers was the most important factor in predicting landslide susceptibility, and BRSNBtree outperformed other methods in terms of prediction performance.

BULLETIN OF ENGINEERING GEOLOGY AND THE ENVIRONMENT (2021)

Add to Collection

Article Plant Sciences

Including leaf trait information helps empirical estimation of jmax from vcmax in cool-temperate deciduous forests

Guangman Song, Quan Wang, Jia Jin

Summary: Understanding the uncertainty in parameterization of Vcmax and Jmax is crucial for predicting carbon fluxes. Recent studies have shown that the relationship between Vcmax and Jmax varies depending on species and leaf traits. Our analysis in cool-temperate forest stands in Japan revealed that leaf traits, particularly LMA, significantly influence the regression, leading to improved model predictions.

PLANT PHYSIOLOGY AND BIOCHEMISTRY (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Evidential Random Forests

Arthur Hoarau, Arnaud Martin, Jean-Christophe Dubois, Yolande Le Gall

Summary: This paper proposes an Evidential Decision Tree and an Evidential Random Forest, which can handle uncertain and imprecise predictions and can predict rich labels. Experimental results showed better performance for the presented methods compared to other evidential models and recent Cautious Random Forests in handling noisy data and effectively uncertainly and imprecisely labeled datasets. The proposed models also offer better robustness and the ability to predict rich labels, which can be used in other approaches such as active learning.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Add to Collection

Article Computer Science, Information Systems

Random Kernel Forests

A. Dmitry Devyatkin, G. Oleg Grigoriev

Summary: This paper proposes an algorithm for training kernel decision trees and random forests, which overcomes the limitations of traditional methods in dealing with multidimensional sparse data. Experimental results show that the proposed algorithm outperforms other methods in various tasks, and the selected regularization technique helps reduce overfitting.

IEEE ACCESS (2022)

Add to Collection

No Data Available

No Data Available

© Peeref 2019-2024. All rights reserved.