Article
Biochemical Research Methods
Baoshan Ma, Ge Yan, Bingjie Chai, Xiaoyu Hou
Summary: This study proposed an improved survival prediction model XGBLC based on the XGBoost framework, using Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. Tested on 20 cancer datasets, XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score, and AUC.
Article
Plant Sciences
Boby Mathew, Andreas Hauptmann, Jens Leon, Mikko J. Sillanpaeae
Summary: Prediction of complex traits based on genome-wide marker information is crucial for animal and plant breeding. Many models have been proposed and efforts are being made to improve their accuracy, considering factors such as additive, dominance, and epistasis effects. In this study, a new algorithm that combines neural networks with LASSO is proposed, which accounts for local epistasis in the prediction. The new method was compared with commonly used prediction methods and showed superior accuracy.
FRONTIERS IN PLANT SCIENCE
(2022)
Article
Multidisciplinary Sciences
Dexin Chen, Jianbo Lai, Jiaxin Cheng, Meiting Fu, Liyan Lin, Feng Chen, Rong Huang, Jun Chen, Jianping Lu, Yuning Chen, Guangyao Huang, Miaojia Yan, Xiaodan Ma, Guoxin Li, Gang Chen, Jun Yan
Summary: Peritoneal recurrence is the most common and lethal type of recurrence in gastric cancer with serosal invasion after surgery. Current evaluation methods are not sufficient for predicting peritoneal recurrence in this type of gastric cancer. Pathomics analyses, consisting of multiple pathomics features extracted from stained images, have shown potential for risk stratification and outcome prediction. A pathomics signature was found to be significantly associated with peritoneal recurrence, and a pathomics nomogram was developed for more accurate prediction.
Article
Energy & Fuels
Nicolas Koch, Lennard Naumann, Felix Pretis, Nolan Ritter, Moritz Schwarz
Summary: This study examines the effectiveness of decarbonization policies in the European transport sector by detecting structural breaks in CO2 emissions. The findings suggest that a combination of carbon or fuel taxes with green vehicle incentives is the most successful policy mix, capable of achieving emission reductions that align with the EU zero emission targets.
Article
Computer Science, Interdisciplinary Applications
Inseok Park
Summary: Kriging is widely used in engineering fields, with Penalized Blind Kriging (PBK) improving predictive performance by systematically selecting models and penalizing likelihood functions.
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION
(2021)
Article
Multidisciplinary Sciences
Anupreet Porwal, Adrian E. Raftery
Summary: Probability models are widely used in statistical tasks and it is important to choose an appropriate model and consider the uncertainty associated with this choice. This study focuses on variable selection in linear regression models and compares 21 popular methods through simulation studies. The results show that three adaptive Bayesian model averaging (BMA) methods perform the best across all statistical tasks.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Article
Computer Science, Hardware & Architecture
Yue Li, Jianfang Qi, Haibin Jin, Dong Tian, Weisong Mu, Jianying Feng
Summary: In this study, a new classifier for predicting customer consumption behavior is proposed. The classifier utilizes a feature selection method based on Lasso and PCA to efficiently select relevant features and eliminate correlations between variables. An improved genetic-XGBoost algorithm is also used to optimize the prediction accuracy by adjusting XGBoost parameters and preventing the model from falling into local extremum. Experimental results demonstrate the superiority of the proposed methods over existing ones, providing a decision-making basis for enterprises to formulate better marketing strategies.
Article
Thermodynamics
Bo Sun, Ruilin Deng, Bin Ren, Minmin Teng, Siyuan Cheng, Fan Wang
Summary: The study introduces the Lasso algorithm to improve model performance, successfully achieving accurate identification of market power abuse in the electricity spot market through the construction of indicator systems and model identification methods.
Article
Computer Science, Artificial Intelligence
Junwen Yang, Yunmin Wang, Xiang Li
Summary: This article proposes a methodology that combines technical analysis and sentiment analysis to predict stock movement. By crawling financial textual content and stock historical transaction data and utilizing transfer learning and the TTR package, emotions are recognized and technical indicators are calculated. The improved LASSO-LSTM model is used for variable selection, and the LASSO-LSTM model shows a significant improvement in accuracy compared to the baseline LSTM model.
PEERJ COMPUTER SCIENCE
(2022)
Article
Mathematics
Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Alvarez, Rosa E. Lillo, Sara Lopez-Taruella, Maria del Monte-Millan, Antonio C. Picornell, Miguel Martin, Juan Romo
Summary: This paper introduces a methodology to deal with variable selection and model estimation problems in a high-dimensional set-up, which can be particularly useful in the whole genome context.
Article
Computer Science, Interdisciplinary Applications
Jasleen Kaur Sethi, Mamta Mittal
Summary: This research investigates the effectiveness of a feature selection method based on LASSO for predicting air quality in Delhi and surrounding cities, identifying meteorological factors and pollutant concentrations as the most important influencing factors, and suggesting preventive measures to improve air quality.
EARTH SCIENCE INFORMATICS
(2021)
Article
Biochemical Research Methods
Ayyuce Begum Bektas, Cigdem Ak, Mehmet Gonen
Summary: With the increasing sizes of computational biology datasets, previous kernel-based machine learning algorithms have failed to provide satisfactory interpretability. To address this issue, we propose a fast and efficient multiple kernel learning algorithm that can extract significant information from genomic data. Our experiments demonstrate that the algorithm outperforms baseline methods while using only a small fraction of input features, and it has the potential to discover new biomarkers and therapeutic guidelines.
Article
Business, Finance
Michael Ellington, Michalis P. Stamatogiannis, Yawen Zheng
Summary: This study investigates the predictability of cross-industry returns for the Shanghai and Shenzhen stock exchanges by constructing portfolios from different industries. The research findings show that the returns of the Oil, Telecommunications, and Finance industries are significant predictors for other industries. The machine learning methods used in the study outperform various benchmarks in the out-of-sample forecasting exercise, with an average annual excess return of 13%.
INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS
(2022)
Article
Mathematical & Computational Biology
Juan C. Laria, David Delgado-Gomez, Inmaculada Penuelas-Calvo, Enrique Baca-Garcia, Rosa E. Lillo
Summary: The deep lasso algorithm, dlasso, is a neural version of the statistical linear lasso algorithm that combines feature selection and automatic parameter optimization, showing superior performance in small sample feature selection. It outperforms the traditional lasso in predictive error and variable selection. With dlasso, it is possible to predict the severity of symptoms in children with ADHD based on scales measuring family burden, family functioning, parental satisfaction, and parental mental health.
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE
(2021)
Article
Mathematics
Zhongzheng Wang, Guangming Deng, Jianqi Yu
Summary: The proposed group screening procedure based on the information gain ratio for a classification model is shown to have better screening performance and classification accuracy.
JOURNAL OF MATHEMATICS
(2022)
Correction
Mathematical & Computational Biology
Ruilin Li, Christopher Chang, Johanne M. Justesen, Yosuke Tanigawa, Junyang Qian, Trevor Hastie, Manuel A. Rivas, Robert Tibshirani
Article
Statistics & Probability
Swarnadip Ghosh, Trevor Hastie, Art B. Owen
Summary: This paper presents a computationally efficient algorithm for regression models with crossed random effect errors. The proposed algorithm has lower cost and more flexible conditions compared to other methods, and it is validated through empirical analysis.
ANNALS OF STATISTICS
(2022)
Article
Statistics & Probability
Elena Tuzhilina, Leonardo Tozzi, Trevor Hastie
Summary: Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. Regularized modification of CCA (RCCA) is widely used for high-dimensional data but may disregard data structure. This article introduces several approaches to regularizing CCA that consider the underlying data structure and demonstrates strategies for avoiding excessive computations in high dimensions.
STATISTICAL MODELLING
(2023)
Article
Statistics & Probability
Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani
Summary: This paper studies minimum l(2) norm interpolation least squares regression in the high-dimensional regime, focusing on linear and nonlinear models. The study discovers the phenomena of double descent behavior in prediction risk and potential benefits of overparametrization.
ANNALS OF STATISTICS
(2022)
Article
Computer Science, Interdisciplinary Applications
Didier Nibbering, Trevor J. Hastie
Summary: This study introduces a multinomial logistic regression model that penalizes the number of class-specific parameters, showing improved performance in both in-sample and out-of-sample situations compared to a standard model. The model clusters parameters by penalizing differences between class-specific parameter vectors, providing interpretable parameter estimates.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2022)
Article
Genetics & Heredity
Yosuke Tanigawa, Junyang Qian, Guhan Venkataraman, Johanne Marie Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, Manuel A. Rivas
Summary: We conducted a systematic assessment of polygenic risk score (PRS) prediction for over 1,500 traits using genetic and phenotype data from the UK Biobank. We found that sparse PRS models showed significant incremental predictive performance and that the number of genetic variants selected in the model correlated with predictive performance. However, the transferability of sparse PRS models trained on European individuals to non-European individuals in the UK Biobank was limited.
Article
Statistics & Probability
J. Kenneth Tay, Nima Aghaeepour, Trevor Hastie, Robert Tibshirani
Summary: In some supervised learning settings, practitioners may have additional information on prediction features. Our proposed method, called the feature-weighted elastic net (fwelnet), uses this information to improve prediction by adjusting penalties on feature coefficients in the elastic net penalty. In simulations, fwelnet outperforms the lasso in terms of test mean squared error and often improves true positive or false positive rates for feature selection. Comparison with other methods reveals fwelnet's superiority, and its application to early prediction of preeclampsia shows improved performance compared to the lasso.
Article
Multidisciplinary Sciences
Aaron T. Mayer, Derek R. Holman, Anav Sood, Utkarsh Tandon, Salil S. Bhate, Sunil Bodapati, Graham L. Barlow, Jeff Chang, Sarah Black, Erica C. Crenshaw, Alexander N. Koron, Sarah E. Streett, Sanjiv S. Gambhir, William J. Sandborn, Brigid S. Boland, Trevor Hastie, Robert Tibshirani, John T. Chang, Garry P. Nolan, Christian M. Schuerch, Stephan Rogalla
Summary: This study uses CODEX technology to create a tissue atlas of inflammation in UC patients and healthy individuals. The analysis reveals the association between cellular functional states and cellular neighborhoods, as well as the presence of resistant niches in UC patients with TNFi treatment. Additionally, the study explores the use of CNNs in predicting patient clinical variables and provides guidelines for reporting predictions in similar datasets.
Article
Environmental Sciences
Shinnosuke Nakayama, WenXin Dong, Richard G. G. Correro, Elizabeth R. R. Selig, Colette C. C. Wabnitz, Trevor J. J. Hastie, Jim Leape, Serena Yeung, Fiorenza Micheli
Summary: Monitoring marine vessel activities is crucial but challenging, especially with limited capacity and resources. Satellite imagery offers a promising solution to observe vessel activities not captured by publicly available tracking data. However, the lack of understanding on its complementarity with existing data hampers its broader use.
FRONTIERS IN MARINE SCIENCE
(2023)
Article
Statistics & Probability
Stephen Bates, Trevor Hastie, Robert Tibshirani
Summary: Cross-validation is a widely used technique for estimating prediction error, but its behavior is not fully understood. It does not estimate the prediction error of the model trained on the data used for cross-validation, but rather the average prediction error of models trained on unseen data from the same population. The standard confidence intervals derived from cross-validation may have lower coverage than desired, due to correlations among the measured accuracies within each fold. A nested cross-validation scheme is introduced to estimate variance more accurately and improve coverage of confidence intervals.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
(2023)
Article
Mathematics, Interdisciplinary Applications
Booil Jo, Trevor J. J. Hastie, Zetan Li, Eric A. A. Youngstrom, Robert L. L. Findling, Sarah McCue Horwitz
Summary: This study proposes a method for integrating latent variable (LV) modeling into supervised learning. By combining the traditions of LV modeling, psychometrics, and supervised learning, practical prediction targets can be generated and systematically validated based on clinical validators. The feasibility of this integrated approach is demonstrated using data from the LAMS Study.
MULTIVARIATE BEHAVIORAL RESEARCH
(2023)
Article
Statistics & Probability
Lukasz Kidzinski, Trevor Hastie
Summary: In clinical practice and biomedical research, it is common to collect sparse and irregularly time-series data, which can be costly and inconvenient. Traditional analysis methods, such as mixed-effect models, Gaussian processes, and functional data analysis, rely on probabilistic assumptions, require careful implementation, and tend to be slow. In this study, we propose a novel framework based on matrix completion for analyzing longitudinal data. By iteratively applying Singular Value Decomposition, our method can estimate progression curves efficiently and easily, and it can be extended to other settings. We applied this method to study the motor impairment progression in children with Cerebral Palsy, and achieved good approximations of individual progression curves and ability to identify different progression trends in subtypes of Cerebral Palsy.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Medicine, General & Internal
Catherine Ley, Frederik Heath, Trevor Hastie, Zijun Gao, Myroslava Protsiv, Julie Parsonnet
Summary: This cross-sectional study determines the normal oral temperature ranges based on age, sex, height, weight, and time of day by analyzing a large number of clinical visit records. The findings have important implications for temperature assessment and disease diagnosis in clinical medicine.
JAMA INTERNAL MEDICINE
(2023)
Article
Computer Science, Interdisciplinary Applications
J. Kenneth Tay, Balasubramanian Narasimhan, Trevor Hastie
Summary: The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for various regression models, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models. In this paper, the authors further extend the reach of the elastic net-regularized regression to all generalized linear model families, Cox models with right-censored data, and a simplified version of the relaxed lasso, and also discuss convenient utility functions for measuring the performance of these fitted models.
JOURNAL OF STATISTICAL SOFTWARE
(2023)
Article
Automation & Control Systems
Zijun Gao, Trevor Hastie
Summary: In this paper, we propose a conditional density estimator (LinCDE) based on gradient boosting and Lindsey's method. LinCDE allows flexible modeling of density family and captures distributional characteristics. It produces smooth and non-negative density estimates.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)