Article
Computer Science, Artificial Intelligence
Zhigang Sun, Guotao Wang, Pengfei Li, Hui Wang, Min Zhang, Xiaowen Liang
Summary: In this paper, an improved random forest algorithm based on the classification accuracy and correlation measurement of decision trees is proposed. The algorithm retains decision trees with better classification effects and reduces the correlations between decision trees to improve the performance of the random forest. Experimental results demonstrate that the proposed improved random forest achieves higher average classification accuracy and outperforms traditional random forests in terms of G-means and other metrics.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Article
Health Care Sciences & Services
Hannah Johns, Julie Bernhardt, Leonid Churilov
Summary: Predicting patient outcomes based on patient characteristics and care processes is common in medical research, but simplifying multifaceted features into scalar variables for statistical analysis may result in a loss of important clinical detail. The limited range of distance-based predictive methods poses a challenge for researchers, who must balance between simplifying features for analysis or using methods that may not fully meet the needs of the analysis problem.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2021)
Article
Computer Science, Artificial Intelligence
Wei Shen, Yilu Guo, Yan Wang, Kai Zhao, Bo Wang, Alan Yuille
Summary: This paper proposes two Deep Differentiable Random Forests methods, Deep Label Distribution Learning Forest (DLDLF) and Deep Regression Forest (DRF), for age estimation. They deal with inhomogeneous data by jointly learning input-dependent data partitions at the split nodes and age distributions at the leaf nodes. Experimental results show that DLDLF and DRF achieve state-of-the-art performance on three age estimation datasets.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2021)
Article
Mathematical & Computational Biology
Denise Rava, Ronghui Xu
Summary: This article discusses the conditional treatment effect for competing risks data in observational studies. It proposes efficient scores for the treatment effect and doubly robust scores considering different models. The estimators have rate double robustness, allowing the use of machine learning and nonparametric methods while maintaining asymptotic normality. Simulation studies and an application to real data are conducted to demonstrate the performance of the estimators. The implemented methods are available in the R package HazardDiff.
STATISTICS IN MEDICINE
(2023)
Article
Environmental Sciences
Aurora Ferrer Palomino, Patricia Sanchez Espino, Cristian Borrego Reyes, Jose Antonio Jimenez Rojas, Francisco Rodriguez y Silva
Summary: This study obtains the moisture content data of vegetation in the Mediterranean region of Andalusia, Spain through sample collection and establishes a predictive model, which can reduce the uncertainty of fire behavior in forest fires.
JOURNAL OF ENVIRONMENTAL MANAGEMENT
(2022)
Article
Statistics & Probability
Jason M. Klusowski, Peter M. Tian
Summary: This article investigates the consistency of decision trees constructed with CART and C4.5 methodology for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size. The theory applies to a wide range of models, including additive regression models with continuous, bounded variation, or generally Borel measurable component functions. The study shows that these qualitative properties of individual trees are inherited by Breiman's random forests.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Chemistry, Multidisciplinary
Eduardo Martinez Garcia, Marcos Garcia Alberti, Antonio Alfonso Arcos Alvarez
Summary: Machine learning is a branch of AI that uses algorithms to extract information from large data sets. It is particularly useful in solving nonlinear problems in engineering. Geotechnical engineering, with its complex relationships between variables, is an ideal field for the application of machine learning techniques.
APPLIED SCIENCES-BASEL
(2022)
Article
Statistics & Probability
Juliana Schulz, Erica E. M. Moodie
Summary: This article introduces a method for estimating optimal dosing strategies for continuous treatments, which shows double robustness against model misspecification when implemented weights meet a particular balancing condition.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2021)
Article
Economics
Arthur Lewbel, Jin Young Choi, Zhuzhu Zhou
Summary: Consider two parametric models, where at least one is correctly specified but the correct one is unknown. Both models share a common vector of parameters. A consistent estimator for this common parameter vector, regardless of the correct model, is referred to as Doubly Robust (DR). We propose a general technique, called Over-identified Doubly Robust (ODR), for constructing DR estimators assuming the models are over-identified. ODR is a simple extension of the Generalized Method of Moments, and we demonstrate its application in various models, particularly in instrumental variables estimation where one of two instrument vectors may be invalid.
JOURNAL OF ECONOMETRICS
(2023)
Article
Automation & Control Systems
Anna C. Neufeld, Lucy L. Gao, Daniela M. Witten
Summary: In this article, we discuss methods for conducting inference on the output of the Classification and Regression Tree (CART) algorithm. We propose a selective inference framework that takes into account that the tree was estimated from the data and provide methods to control selective Type 1 error rate and achieve nominal selective coverage. We also provide efficient algorithms for computing the necessary conditioning sets and apply these methods in simulation and to a real dataset.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Physics, Multidisciplinary
Elzbieta Turska, Szymon Jurga, Jaroslaw Piskorski
Summary: This study applied tree-based classification algorithms to detect mood disorders in lower secondary school students, finding that the rpart algorithm was the most sensitive in detecting real cases. The most important factor in developing mood disorders was found to be the adolescents' relationships with their parents.
Article
Statistics & Probability
Mehdi Dagdoug, Camelia Goga, David Haziza
Summary: This article discusses the methods of estimating finite population parameters in surveys by incorporating auxiliary information to improve estimation precision. It uses random forests to estimate the relationship between survey variables and auxiliary variables and explores a model-calibration procedure for handling multiple survey variables. The results of a simulation study show that the proposed methods perform well in terms of bias, efficiency, and coverage of confidence intervals in various settings.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Forestry
Gongqiao Zhang, Gangying Hui
Summary: This study explores the importance of random trees in natural forests and suggests that they are the cornerstones of natural forests. The research shows that the features of random trees are highly consistent with those of the communities, indicating the key role they play in natural forests.
Article
Plant Sciences
Denis A. Shah, Erick D. De Wolf, Pierce A. Paul, Laurence V. Madden
Summary: This article examines the potential of using random forests models for binary prediction of Fusarium head blight (FHB) epidemics, aiming to find a balance between model simplicity and complexity without sacrificing accuracy.
Article
Computer Science, Information Systems
Christian Luelf, Denis Mayr Lima Martins, Marcos Antonio Vaz Salles, Yongluan Zhou, Fabian Gieseke
Summary: The abundance of data in various domains poses challenges to data exploration and analysis. This work proposes a novel framework that allows users to search for target objects interactively by specifying queries through positive and negative examples. The framework utilizes index-aware construction scheme and multidimensional indexing structures to process queries efficiently.
PROCEEDINGS OF THE VLDB ENDOWMENT
(2023)