Article
Computer Science, Interdisciplinary Applications
Wei Zhong, Jiping Wang, Xiaolin Chen
Summary: Feature screening is essential for ultrahigh dimensional data analysis, and a new model-free marginal feature screening approach for survival data with right censoring is proposed. The method, based on censored mean variance index, is robust to model misspecification and can identify important covariates for both categorical and continuous data. The proposed approach has been demonstrated through simulations and a real data example to have competitive performance.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2021)
Article
Statistics & Probability
Shuiyun Lu, Xiaolin Chen, Hong Wang
Summary: This article introduces a new conditional feature screening procedure for ultra-high dimensional survival data using conditional distance correlation. It is model-free and robust to heavy tails or extreme values in both covariates and response. Simulation studies and analysis of real data illustrate the advantages of the proposed approach over existing methods.
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS
(2021)
Article
Statistics & Probability
Daojiang He, Jinjiao Cheng, Kai Xu
Summary: This article proposes a kernel-based method for feature screening in ultrahigh-dimensional data. The method demonstrates sure screening and rank consistency properties under weak assumptions. Furthermore, it shows that statistics generated by kernels in the distance kernel family are more sensitive for feature screening in ultra-high dimensions.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2023)
Article
Automation & Control Systems
Sumit Mukherjee, Subhabrata Sen
Summary: We study high-dimensional bayesian linear regression with product priors. We derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution using the theory of non-linear large deviations. Assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant. Additionally, we establish a unique optimizer for the variational problem under an additional separation condition, which governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this separation condition and illustrate the results using concrete examples.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Mathematics, Interdisciplinary Applications
Kai Xu, Xudong Huang
Summary: This paper proposes a new sure independence screening procedure for high-dimensional survival data based on censored quantile correlation, which is robust against outliers and capable of discovering the nonlinear relationship between variables. Simulation results demonstrate its competitive performance on survival datasets with high-dimensional predictors.
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY
(2021)
Article
Statistics & Probability
Zhe Fei, Qi Zheng, Hyokyoung G. Hong, Yi Li
Summary: This study proposes a novel method within the framework of global censored quantile regression to draw inference on the effects of high-dimensional predictors. The method investigates covariate-response associations over an interval of quantile levels and properly quantifies the uncertainty of the estimates.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Automation & Control Systems
Yuqin Xu, Quan Qian
Summary: This study addresses the problem of combinatorial explosion in the SISSO method by using the mRMR algorithm, resulting in improved efficiency and accuracy. Experimental results demonstrate that the mutual information-based SISSO method significantly reduces time consumption while maintaining the error close to that of the original SISSO model.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2022)
Article
Statistics & Probability
Abhik Ghosh, Erica Ponzi, Torkjel Sandanger, Magne Thoresen
Summary: In this paper, we discuss a new robust screening procedure based on MDPDE for variable screening in ultra-high-dimensional GLMs. Our proposed method performs well under pure and contaminated data scenarios. The theoretical motivation and proof for the use of marginal MDPDEs, as well as the derivation of a reliable conditional screening method for GLMs, are also provided.
SCANDINAVIAN JOURNAL OF STATISTICS
(2023)
Article
Statistics & Probability
Runze Li, Kai Xu, Yeqing Zhou, Liping Zhu
Summary: In this article, we propose a novel test based on an aggregation of the marginal cumulative covariances to accommodate heteroscedasticity and high dimensionality in high-dimensional data. Our proposed test statistic is scale-invariance, tuning-free, and easy to implement, with established asymptotic normality under the null hypothesis. We find that our proposed test is much more powerful than existing competitors for covariates with heterogeneous variances, even under high-dimensional linear models, while maintaining high efficiency for homoscedastic covariates.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Mathematics, Applied
Peng Lai, Mingyue Wang, Fengli Song, Yanqiu Zhou
Summary: Linear discriminant analysis (LDA) is widely used in discriminant classification and pattern recognition. However, it fails when dealing with high or ultrahigh-dimensional data. To address this, a feature screening procedure based on Fisher's linear projection and marginal score test is proposed. The procedure ensures that important features are retained and irrelevant predictors are eliminated. Monte Carlo simulation studies and a real-life data example are used to assess its finite sample properties.
Article
Statistics & Probability
Pierre C. Bellec, Cun-Hui Zhang
Summary: This paper introduces a second-order Stein formula to characterize the variance of random variables for functions with square integrable gradient, demonstrating its usefulness in various applications. Additionally, it presents statistical applications such as SURE estimation, confidence intervals, and upper bounds on model selection variance.
ANNALS OF STATISTICS
(2021)
Article
Statistics & Probability
Di He, Yong Zhou, Hui Zou
Summary: This article systematically studies variable screening methods for multi-response data, proposing a new model-free screening method called mRCC. The sure screening property of mRCC is established under weak regularity conditions, and extensive numerical experiments demonstrate its superior performance over other available alternatives.
Article
Mathematics
Hamza Daoudi, Zouaoui Chikr Elmezouar, Fatimah Alshahrani
Summary: This paper investigates the asymptotic properties of conditional functional parameters for an explanatory variable with values in an infinite-dimensional Hilbert space and a response variable in a quasi-associated dependency framework. The non-parametric estimation of the conditional distribution function is studied using the kernel method in the presence of quasi-associated dependence. The almost complete convergence of the estimator in the associated case is established under general hypotheses. The conditional hazard function is estimated using the two outcomes of the conditional distribution function and the conditional density. The asymptotic normality of the kernel estimator is established, and the asymptotic variance is explicitly given. Simulation studies are conducted to examine the behavior of the asymptotic property with finite sample data.
Article
Economics
Qinqin Hu, Lu Lin
Summary: A new feature screening tool and a two-stage regularization framework were proposed to tackle high dimensionality and endogeneity issues, demonstrating consistency in ranking with exponential growth of predictors. Simulation studies supported the effectiveness of the proposed method.
COMPUTATIONAL ECONOMICS
(2022)
Article
Computer Science, Artificial Intelligence
Kexuan Li, Fangfang Wang, Lingli Yang, Ruiqi Liu
Summary: In this paper, a novel two-step nonparametric approach called Deep Feature Screening (DeepFS) is proposed to address the challenges in applying traditional statistical feature selection methods to high-dimension, low-sample-size data. DeepFS combines the strengths of deep neural networks and feature screening, and it is model-free, distribution-free, and capable of recovering the original input data. Extensive simulation studies and real data analyses demonstrate the superiority of DeepFS.