Article
Statistics & Probability
Chang Liu, Yue Yang, Howard Bondell, Ryan Martin
Summary: In the context of high-dimensional linear regression models, this study introduces a new approach to address multicollinearity issues, typically achieving optimal posterior estimation and demonstrating superior performance compared to existing methods on real and simulated data.
Article
Statistics & Probability
Jia Wang, Xizhen Cai, Runze Li
Summary: A new Bayesian variable selection approach for partially linear models is proposed in this paper, addressing issues of estimation error and multicollinearity while maintaining model selection consistency and outperforming existing methods in highly correlated predictor settings. The method utilizes a one-step procedure, employs the difference-based method to reduce the impact from nonparametric component estimation, and incorporates Bayesian subset modeling with diffusing prior (BSM-DP) to shrink the corresponding estimator in the linear component. Simulation studies support the theory and efficiency of the proposed method, with an application in a study of supermarket data demonstrating its superiority over existing methods.
JOURNAL OF MULTIVARIATE ANALYSIS
(2021)
Article
Biochemical Research Methods
Jacob Williams, Shuangshuang Xu, Marco A. R. Ferreira
Summary: In this study, a novel Bayesian variable selection method based on nonlocal priors is proposed for genome-wide association studies. The method, called BGWAS, effectively reduces false positive rates while maintaining the ability to detect true positive SNPs. It achieves this through a two-step process of screening and model selection.
BMC BIOINFORMATICS
(2023)
Article
Economics
Wenlu Tang, Jinhan Xie, Yuanyuan Lin, Niansheng Tang
Summary: This article focuses on identifying important features in high-dimensional data analysis and introduces a multiple testing procedure based on quantile correlation. A stepwise procedure and sure independent screening method using quantile correlation are also developed. Numerical studies show that these methods perform well in practical settings.
JOURNAL OF BUSINESS & ECONOMIC STATISTICS
(2022)
Article
Social Sciences, Mathematical Methods
Edgar C. Merkle, Oludare Ariyo, Sonja D. Winter, Mauricio Garnier-Villarreal
Summary: This article reviews common situations in Bayesian latent variable models where the prior distribution specified by a researcher differs from the one used during estimation. It explores solutions and provides recommendations for practice.
METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES
(2023)
Article
Automation & Control Systems
Takuo Matsubara, Chris J. Oates, Francois-Xavier Briol
Summary: Bayesian neural networks aim to combine predictive performance with uncertainty quantification, proposing a way to approximate Gaussian processes for parameter priors. Non-asymptotic analysis with finite error bounds shows the ability of Bayesian neural networks to approximate any sufficiently regular covariance Gaussian process. Experimental assessment demonstrates the superiority of the proposed ridgelet prior in regression tasks.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Mathematics, Interdisciplinary Applications
Jin Wang, Yunbo Ouyang, Yuan Ji, Feng Liang
Summary: In this study, we explore the Bayesian approach to variable selection in linear regression models. We propose an efficient EM algorithm that returns the MAP estimator of the relevant variables set. The algorithm avoids the need for inverting large matrices in each iteration, making it scalable for big data. Additionally, we introduce an ensemble EM algorithm to address the issue of local modes and achieve better variable selection results. Empirical studies have shown the superior performance of the ensemble EM algorithm.
Article
Computer Science, Artificial Intelligence
Gael Poux-Medard, Julien Velcin, Sabine Loudcher
Summary: This article introduces a flexible method, powered Dirichlet-Hawkes process (PDHP), to create clusters of textual documents based on both their content and publication time. Experimental results show that PDHP performs significantly better than existing models when textual or temporal information is weakly informative, and it alleviates the assumption that textual content and temporal dynamics are always perfectly correlated.
KNOWLEDGE AND INFORMATION SYSTEMS
(2022)
Article
Statistics & Probability
Jia Wang, Xizhen Cai, Xiaoyue Niu, Runze Li
Summary: This article introduces a class of network models where the likelihood of connection is influenced by high-dimensional nodal covariates and node-specific popularity. A Bayesian method is proposed for feature selection, with implementation via Gibbs sampling. To address computational challenges in large sparse networks, a working model is developed for parameter updates based on dense sub-graphs. Model selection consistency is proven for both models, even when dimension grows exponentially. Monte Carlo studies and real world examples illustrate the performance of the proposed models and estimation procedures. Supplementary materials are available online.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Nutrition & Dietetics
Nicola Pesenti, Piero Quatto, Elena Colicino, Raffaella Cancello, Massimo Scacchi, Antonella Zambon
Summary: The study compares the performance of three supervised Bayesian variable selection methods in detecting important predictors in high-dimensional data. It provides practical guidelines for their use and identifies when one model should be preferred over the others. The results show that BKMR outperforms other models with small datasets, BSR performs comparably to BKMR with large datasets, and BLASSO should be used when there are no synergies between predictors and there is a monotonous predictor-outcome relationship. The models were also applied to a real case study on obesity in hospitalized women.
FRONTIERS IN NUTRITION
(2023)
Article
Economics
N. Packham, F. Woebbeking
Summary: We propose a general approach for stress testing correlations of financial asset portfolios. The method specifies the correlation matrix of asset returns parametrically, where correlations are represented as a function of risk factors, such as country and industry factors. Bayesian variable selection methods are used to build a sparse factor structure linking assets and risk factors. Regular calibration yields a joint distribution of economically meaningful stress scenarios of the factors. The approach also serves as a reverse stress testing framework, allowing the inference of worst-case correlation scenarios using the Mahalanobis distance or Highest Density Regions (HDR) on the joint risk factor distribution. We provide examples of stress tests on a large portfolio of European and North American stocks.
JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION
(2023)
Article
Statistics & Probability
Antonio R. Linero, Junliang Du
Summary: This article investigates the problem of high-dimensional Bayesian nonparametric variable selection using an aggregation of weak learners. The authors propose a solution by inducing sparsity in ensembles of weak learners through the use of Gibbs distributions and show the advantages of this approach.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Statistics & Probability
Yi Liu, Veronika Rockova, Yuexi Wang
Summary: The study abandons the linear model framework and turns to tree-based methods for variable selection, proposing a Bayesian tree-based probabilistic method that shows consistency under certain conditions. Additionally, a new ABC sampling method based on data-splitting is introduced to achieve higher acceptance rates, successfully identifying variables with high marginal inclusion probabilities. This research provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
(2021)
Article
Mathematics
Dario Ramos-Lopez, Ana D. Maldonado
Summary: Multi-class classification in imbalanced datasets presents a challenging problem where traditional validation metrics may not be suitable. A cost-sensitive variable selection procedure is proposed to build a Bayesian network classifier, optimizing a specified cost function. Fine-tuning the objective validation function can improve prediction quality in imbalanced data or when considering asymmetric misclassification costs.
Article
Chemistry, Physical
Yongshun Luo, Gang Li, Xu Chen, Ling Lin
Summary: This paper proposes a two-dimensional variable selection method to solve the problem of spectral collinearity in complex solutions by reforming the component spectrum. The effectiveness of the method is studied through quantitative analysis of four components in human blood, and the results show that the prediction accuracy based on the two-dimensional variable selection method is superior to that based on one-dimensional variables.
JOURNAL OF MOLECULAR STRUCTURE
(2022)
Article
Statistics & Probability
Yan Dora Zhang, Brian P. Naughton, Howard D. Bondell, Brian J. Reich
Summary: This article proposes a new class of shrinkage priors for high-dimensional linear regression through specifying a prior on the model fit and distributing it to the coefficients in a novel way. The proposed method outperforms previous approaches in concentration and tail behavior, leading to improved posterior contraction and empirical performance.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2022)
Article
Environmental Sciences
David B. Huberman, Brian J. Reich, Howard D. Bondell
Summary: This paper introduces a conditional distribution estimation technique that combines machine learning algorithms to simultaneously estimate the entire conditional distribution and flexibly incorporate machine learning techniques, with the purpose of forecasting tropical cyclone intensity to provide additional insights and influence decision-making. Through simulation studies and real data validation, the effectiveness of the method is demonstrated, with further developments applicable to more complex forecasting and other applications.
ENVIRONMENTAL AND ECOLOGICAL STATISTICS
(2022)
Article
Statistics & Probability
Francis K. C. Hui, Howard D. Bondell
Summary: Spatial confounding is a contentious research area in spatial statistics, primarily focused on spatial mixed models but also relevant in the context of generalized estimating equations (GEEs). To address spatial confounding, a restricted spatial working correlation matrix is proposed to estimate a partitioned covariate effect in GEEs.
AMERICAN STATISTICIAN
(2022)
Article
Mathematics, Applied
Nick James, Max Menzies, Howard Bondell
Summary: This paper applies various methods to study the performance trends of elite athletes, revealing the Olympic effect, leveling off of athlete scores, similarities in performance trends between men and women's categories, and analyzing the geographic composition of top athletes.
Article
Statistics & Probability
Yiping Guo, Howard Bondell
Summary: This paper explores the application of multivariate t-distributions in probabilistic principal component analysis (PPCA) and provides a reexamination of some errors in the existing literature. Additionally, a new Monte Carlo expectation-maximization (MCEM) algorithm is introduced to implement a general type of such models.
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS
(2023)
Article
Materials Science, Textiles
Erin Roberts, Sujit Ghosh, Behnam Pourdeyhimi
Summary: This study developed a novel method to measure roping in meltblown nonwovens and analyzed its impact on pore size uniformity, filtration efficiency, and barrier properties. The study found that the interactions of capillary density with air flow and air flow with die-to-collector distance had the greatest influence on roping formation.
JOURNAL OF THE TEXTILE INSTITUTE
(2023)
Article
Statistics & Probability
Weichang Yu, Sara Wade, Howard D. Bondell, Lamiae Azizi
Summary: High-dimensional classification and feature selection tasks are common with the advancement of data acquisition technology. In fields such as biology, genomics, and proteomics, where data are often functional and exhibit roughness and nonstationarity, traditional methods face additional challenges. In this work, we propose a novel approach called Gaussian process discriminant analysis (GPDA) that combines variable selection and classification in a unified framework. By utilizing sparse inverse covariance matrices and variational methods, our approach achieves scalable inference and demonstrates good performance.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Economics
Ali Zeytoon-Nejad, Barry K. Goodwin, Sujit Ghosh
Summary: This paper proposes a generalized variant of the CES production function that allows for the inclusion of minimum required levels of inputs. Empirical applications are provided for irrigation and nitrogen using experimental datasets and datasets generated through Monte-Carlo experiments.
Article
Computer Science, Interdisciplinary Applications
Rahul Ghosal, Sujit Ghosh, Jacek Urbanek, Jennifer A. Schrack, Vadim Zipunnikov
Summary: This study proposes a new estimation method for shape-constrained functional regression models using Bernstein polynomials. Theoretical results demonstrate the consistency of the constrained estimators, and numerical analysis shows improved efficiency and accuracy of the estimators under shape constraints.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2023)
Article
Ecology
Xun Lu, Yuyuan Che, Roderick M. Rejesus, Barry K. Goodwin, Sujit K. Ghosh, Jayash Paudel
Summary: Agricultural policies can indirectly impact the natural environment through their influence on farmer input behavior. This study examines the specific effects of crop insurance participation on nitrogen and phosphorus concentrations in waterways. The results suggest that higher crop insurance participation is associated with lower nitrogen concentrations but does not have a consistent effect on phosphorus concentrations.
ECOLOGICAL ECONOMICS
(2023)
Article
Statistics & Probability
Yiming Wang, Sujit K. Ghosh
Summary: This paper proposes a nonparametric model using Bernstein polynomials to approximate arbitrary isotropic covariance functions. The popular L-alpha and L-2 norms are used to investigate the approximation properties. A computationally efficient sieve maximum likelihood (sML) estimation method is developed to estimate the unknown isotropic covariance function. Numerical results show that the proposed approach outperforms both parametric and nonparametric methods in terms of reducing bias and having lower norms.
JOURNAL OF NONPARAMETRIC STATISTICS
(2023)
Article
Statistics & Probability
Dasom Lee, Sujit Ghosh
Summary: In many clinical trials, binary-valued patient outcomes measured asynchronously over time across different dose levels are common. To address autocorrelation among these longitudinally observed outcomes, a first-order Markov model for binary data is developed. Nonhomogeneous models for transition probabilities are proposed to account for asynchronously observed time points, with B-spline basis functions used for modeling the transition probabilities. The model also allows estimation of any underlying non-decreasing curve based on suitable prior distributions, along with the incorporation of individual-specific random effects through a mixed effect model. Numerical comparisons with traditional models are conducted using simulated data sets, as well as practical applications using real data sets.
JOURNAL OF STATISTICAL THEORY AND PRACTICE
(2023)
Article
Astronomy & Astrophysics
Shubham Kanodia, Matthias Y. He, Eric B. Ford, Sujit K. Ghosh, Angie Wolfgang
Summary: This work extends the existing nonparametric and probabilistic framework to simultaneously model distributions beyond two dimensions. The potential of this multidimensional approach is showcased in several science cases relating to planetary mass, radius, insolation, stellar mass, and dust mass measurements. Bootstrap and Monte Carlo sampling are employed to quantify the impact of finite sample size and measurement uncertainties. The open-source MRExo Python package is updated to incorporate these changes and provides users with a flexible framework for various data sets.
ASTROPHYSICAL JOURNAL
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Dongting Hu, Liuhua Peng, Tingjin Chu, Xiaoxing Zhang, Yinian Mao, Howard Bondell, Mingming Gong
Summary: This paper presents an uncertainty quantification method for supervised MDE models, capturing uncertainty through predictive variance and estimating error variance and estimation variance using constrained ordinal regression and bootstrapping methods. Experimental results demonstrate the accuracy and effectiveness of the proposed method.
COMPUTER VISION - ECCV 2022, PT II
(2022)
Article
Mathematics, Interdisciplinary Applications
Edward Boone, Jan Hannig, Ryad Ghanam, Sujit Ghosh, Fabrizio Ruggeri, Serge Prudhomme
Summary: This paper investigates the validation process of a single degree-of-freedom oscillator to assess its predictive capabilities. Model validation is the process of determining the accuracy of a model in predicting observed physical events or system features. Virtual data is generated from a non-linear oscillator, and a mathematical model is derived by neglecting the non-linear term. Bayesian updating is used to identify model parameters, including calibration of the normal probability density function representing model error.
Article
Statistics & Probability
Omidali Aghababaei Jazi
Summary: In this paper, a pseudo-partial likelihood estimation method is proposed to estimate parameters in the Cox proportional hazards model with right-censored and biased sampling data by adjusting sample risk sets. The asymptotic properties of the resulting estimator are studied, and a simulation study is conducted to illustrate the finite sample performance. The proposed method is also applied to analyze a set of HIV/AIDS data.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Liya Fu, Shuwen Hu, Jiaqi Li
Summary: Empirical likelihood (EL) is an effective nonparametric method that combines estimating equations flexibly and adaptively. A penalized EL method based on robust estimating functions is proposed for variable selection in a high-dimensional model, allowing the dimensions to grow exponentially with the sample size. The proposed method improves robustness and effectiveness in the presence of outliers or heavy-tailed data. Extensive simulation studies and a real data example demonstrate the enhanced variable selection accuracy when dealing with heavy-tailed data or outliers.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Yifan Sun, Ziyi Liu, Wu Wang
Summary: This paper extends the classical functional linear regression model to allow for heterogeneous coefficient functions among different subgroups of subjects. A penalization-based approach is proposed to simultaneously determine the number and structure of subgroups and coefficient functions within each subgroup. The paper provides an effective computational algorithm and establishes the oracle properties and estimation consistency of the model. Extensive numerical simulations demonstrate its superiority compared to competing methods, and an analysis of an air quality dataset leads to interesting findings and improved predictions.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Takemi Yanagimoto, Yoichi Miyata
Summary: A Bayesian estimator is proposed to improve the conditional maximum likelihood estimation by introducing a pair of priors. The conditional maximum likelihood estimation is explained using the posterior mode under a prior, and a promising estimator is defined using the posterior mean under a corresponding prior. The advantages of this approach include two different optimality properties of the induced estimator, the ease of various extensions, and the possible treatments for a finite sample size. The existing approaches are discussed and critiqued.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Sameera Hewage, Yongli Sang
Summary: This paper introduces a new method for measuring dependence, the categorical Gini correlation rho(g), and proposes a Jackknife empirical likelihood approach for constructing confidence intervals. Simulation studies and real data applications demonstrate competitive performance of the proposed method in terms of coverage accuracy and interval length.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Isadora Antoniano-Villalobos, Cristiano Villa, Stephen G. Walker
Summary: Constructing objective priors for multidimensional parameter spaces is challenging, and a common approach assumes independence and uses standard objective methods to obtain marginal distributions. In this paper, a novel objective prior is proposed by extending the objective method for one-dimensional case, allowing for a dependence structure in multidimensional parameter spaces.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Hui Li, Liuqing Yang, Kashinath Chatterjee, Min-Qian Liu
Summary: Supersaturated design (SSD) plays a crucial role in factor screening, and E(f(NOD)) criterion is one of the most widely used criteria for evaluating multi-level and mixed-level SSDs. This paper provides methods to construct multi-level E(f(NOD)) optimal SSDs with general run sizes, which can also be extended to mixed-level SSDs. The main idea of these methods is to combine two processed generalized Hadamard matrices with the expansive replacement method. These proposed methods are easy to implement, and the non-orthogonality between any two columns of the resulting SSDs is well controlled by that of the source designs.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Victoria L. Leaver, Robert G. Clark, Pavel N. Krivitsky, Carole L. Birrell
Summary: This article compares three likelihood approaches to estimation under informative sampling and examines their efficiency and asymptotic variance. The study shows that sample likelihood estimation approaches the efficiency of full maximum likelihood estimation when the sample size tends to infinity and the sampling fraction tends to zero. However, when the sample size tends to infinity and the sampling fraction is not negligible, maximum likelihood estimation is more efficient due to considering the possibility of duplicate samples. Pseudo-likelihood estimation can perform poorly in certain cases. For a special case where the superpopulation is exponential and the selection is probability proportional to size, the anticipated variance of pseudo-likelihood estimation is infinite.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Fadoua Balabdaoui, Harald Besdziek
Summary: The two-component mixture model with known background density, unknown signal density, and unknown mixing proportion has been studied in this paper. The log-concave MLE of the signal density is computed using the estimator of Patra & Sen (2016), and its consistency and convergence are shown. The performance of this method is evaluated through a simulation study.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
V. Girardin, R. Senoussi
Summary: This paper investigates different issues related to stationarity reduction in autoregressive models, including both continuous and discrete time cases. Necessary and sufficient conditions for autoregressive models to be weakly stationary are explored, with explicit formulas for the time changes. Furthermore, the issue of stationarity reduction for discrete sequences sampled from continuous time autoregressive processes is also considered.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Juan Jose Fernandez-Duran, Maria Mercedes Gregorio-Dominguez
Summary: This paper presents the application of nonnegative trigonometric sums (NNTS) models in circular data analysis. Regression models for circular-dependent variables are constructed by fitting great circles on the parameter hypersphere, enabling the identification of different regions along the circle. The transformation of the original circular variable into a linear variable allows for the application of common linear regression methods in circular data analysis.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Miao Han, Yuanyuan Lin, Wenxin Liu, Zhanfeng Wang
Summary: The article proposes a method based on maximum rank correlation and concave fusion to automatically determine the number of subgroups, identify subgroup structure, and estimate subgroup-specific covariate effects. The method can be used without prior grouping information and is applicable to handling censored data.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)
Article
Statistics & Probability
Qing He, Hsin-Hsiung Huang
Summary: This article introduces a method for spatiotemporal data analysis with massive zeros, which is widely used in epidemiology and public health. The method fits zero-inflated negative binomial models using a Bayesian framework and employs latent variables from Polya-Gamma distributions to improve computational efficiency.
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2024)