4.7 Article

Sparse partial least-squares regression and its applications to high-throughput data analysis

期刊

出版社

ELSEVIER
DOI: 10.1016/j.chemolab.2011.07.002

关键词

Lasso; Modeling; Prediction; Regression analyses; Variable selection

资金

  1. Swedish Research Council
  2. National Research Foundation of Korea(NRF)
  3. Ministry of Education, Science and Technology [2010-0011372]
  4. National Research Foundation of Korea [2010-0011372] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

The partial least-squares (PLS) method is designed for prediction problems where the number of predictors is larger than the number of training samples. PIS is based on latent components that are linear combinations of all of the original predictors, so it automatically employs all predictors regardless of their relevance. This will potentially compromise its performance, but it will also make it difficult to interpret the result. In this paper, we propose a new formulation of the sparse PIS (SPLS) procedure to allow both sparse variable selection and dimension reduction. We use the standard L-1-penalty and the unbounded penalty of [1]. We develop a computing algorithm for SPLS by modifying the nonlinear iterative partial least-squares (NIPALS) algorithm, and illustrate the method with an analysis of a cancer dataset. Through the numerical studies we find that our SPLS method generally performs better than the standard PIS and other existing methods in variable selection and prediction. (C) 2011 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Environmental Sciences

Investigation of Correlated Internet and Smartphone Addiction in Adolescents: Copula Regression Analysis

Minji Lee, Sun Ju Chung, Youngjo Lee, Sera Park, Jun-Gun Kwon, Dai Jin Kim, Donghwan Lee, Jung-Seok Choi

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2020)

Article Environmental Sciences

The Burden of Cervical Cancer in Korea: A Population-Based Study

Jinhee Kim, Donghwan Lee, Kyung-Bok Son, SeungJin Bae

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2020)

Article Environmental Sciences

A Meta-Regression Analysis of Utility Weights for Breast Cancer: The Power of Patients' Experience

Jiryoun Gong, Juhee Han, Donghwan Lee, Seungjin Bae

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2020)

Review Statistics & Probability

A review on recent advances and applications of h-likelihood method

Woojoo Lee, Il Do Ha, Maengseok Noh, Donghwan Lee, Youngjo Lee

Summary: This paper reviews the application of the h-likelihood method in various statistical areas since its introduction in 1994. It covers clustered survival data analysis, competing risk models, joint models, high-dimensional analysis, spatial analysis, and multiple testing.

JOURNAL OF THE KOREAN STATISTICAL SOCIETY (2021)

Article Environmental Sciences

Value Frameworks: Adaptation of Korean Versions of Value Frameworks for Oncology

Green Bae, SeungJin Bae, Donghwan Lee, Juhee Han, Dong-Hoe Koo, Do Yeun Kim, Hee-Jun Kim, Sung Young Oh, Hee Yeon Lee, Jong Hwan Lee, Hye Sook Han, Hyerim Ha, Jin Hyoung Kang

Summary: This study aimed to develop a reliable Korean oncology value framework, by translating and examining the frameworks of ASCO and ESMO, and collecting data using AHP and FGIs. The results showed good reliability for ASCO, with AHP indicating that clinical benefit has the highest priority, and FGIs suggesting that ESMO and ASCO should be used complementarily.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2021)

Article Environmental Sciences

Medication Adherence and Persistence of Open-Angle Glaucoma Patients in Korea: A Retrospective Study Using National Health Insurance Claims Data

Yunjeong Jang, Donghyun Jee, Donghwan Lee, Nam-Kyong Choi, SeungJin Bae

Summary: This study aimed to analyze medication adherence and persistence among open-angle glaucoma patients in Korea. Results showed that older age, female gender, the use of prostaglandins as the index medication, and visits to secondary or tertiary hospitals were associated with higher rates of adherence and persistence during the study period.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2021)

Article Environmental Sciences

In silico prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method

Yeonsoo Kang, Boram Jeong, Doo-Hyeon Lim, Donghwan Lee, Kyung-Min Lim

Summary: This study aimed to construct an in silico model to predict the eye irritation category of liquid chemicals, and achieved high accuracy in ternary categorization of eye irritation potential with a two-stage random forest approach. The prediction model showed excellent performance in distinguishing Category 1 and Category 2 chemicals.

JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH-PART A-CURRENT ISSUES (2021)

Article Psychiatry

Identification of Major Psychiatric Disorders From Resting-State Electroencephalography Using a Machine Learning Approach

Su Mi Park, Boram Jeong, Da Young Oh, Chi-Hyun Choi, Hee Yeon Jung, Jun-Young Lee, Donghwan Lee, Jung-Seok Choi

Summary: The study successfully predicted major psychiatric disorders using machine learning techniques combined with EEG data, suggesting the potential value of electronic devices in identifying psychiatric patients.

FRONTIERS IN PSYCHIATRY (2021)

Review Statistics & Probability

Revisiting the analysis pipeline for overdispersed Poisson and binomial data

Woojoo Lee, Jeonghwan Kim, Donghwan Lee

Summary: This study aims to clarify the relationships among various statistical methods for detecting and handling overdispersion in categorical data analysis, compare their performances, and propose a method for correcting finite sample bias. It also aims to reconsider the current practice for handling overdispersed categorical data and provide graphical tools for model selection. Furthermore, it investigates the assumptions behind the score statistics and their applicability to analyzing overdispersed data.

JOURNAL OF APPLIED STATISTICS (2023)

Article Pharmacology & Pharmacy

Application of Machine Learning Classification to Improve the Performance of Vancomycin Therapeutic Drug Monitoring

Sooyoung Lee, Moonsik Song, Jongdae Han, Donghwan Lee, Bo-Hyung Kim

Summary: In this study, a classifier using machine learning was developed to select a suitable vancomycin pharmacokinetic model for therapeutic drug monitoring in patients. Through training and validation, the classifier showed stable accuracy and may contribute to the improvement of therapeutic drug monitoring.

PHARMACEUTICS (2022)

Article Neurosciences

Multiple-Kernel Support Vector Machine for Predicting Internet Gaming Disorder Using Multimodal Fusion of PET, EEG, and Clinical Features

Boram Jeong, Jiyoon Lee, Heejung Kim, Seungyeon Gwak, Yu Kyeong Kim, So Young Yoo, Donghwan Lee, Jung-Seok Choi

Summary: This study used machine learning methods to analyze multimodal neuroimaging data and improve the prediction accuracy of Internet gaming disorder (IGD). The results showed that the multiple-kernel support vector machine method had higher accuracy in predicting IGD, and clinical variables contributed the most to the prediction model.

FRONTIERS IN NEUROSCIENCE (2022)

Article Mathematical & Computational Biology

Overall assessment for selected markers from high-throughput data

Woojoo Lee, Donghwan Lee, Yudi Pawitan

Summary: This paper focuses on reproducibility assessment in high-throughput studies and proposes a selection-adjusted false-discovery rate (sFDR) as an overall assessment measure. By integrating information from both training and validation studies and considering the effects of non-random selection, sFDR provides a more accurate evaluation. Simulation studies and real metabolomic datasets are used to illustrate the application of sFDR in high-throughput data analysis.

STATISTICS IN MEDICINE (2022)

Article Mathematics, Interdisciplinary Applications

Bias reduction for semi-competing risks frailty model with rare events: application to a chronic kidney disease cohort study in South Korea

Jayoun Kim, Boram Jeong, Il Do Ha, Kook-Hwan Oh, Ji Yong Jung, Jong Cheol Jeong, Donghwan Lee

Summary: This study introduces a new method for handling semi-competing risk data. By incorporating penalized likelihood estimation and the gamma frailty model, the proposed method reduces bias caused by rare events in datasets with a small number of events.

LIFETIME DATA ANALYSIS (2023)

Article Pharmacology & Pharmacy

Pharmacokinetic comparison between a fixed-dose combination of fimasartan/amlodipine/hydrochlorothiazide 60/10/25 mg and a corresponding loose combination of fimasartan/amlodipine 60/25 mg and hydrochlorothiazide 25 mg in healthy subjects

Jihyun Jung, Soyoung Lee, Jaeseong Oh, SeungHwan Lee, In-Jin Jang, Donghwan Lee, Kyung-Sang Yu

Summary: The new FDC of fimasartan/amlodipine/hydrochlorothiazide 60/10/25 mg demonstrated similar PK profiles to the corresponding loose combination, and both treatments were well tolerated.

TRANSLATIONAL AND CLINICAL PHARMACOLOGY (2021)

Article Statistics & Probability

Comparison of graph clustering methods for analyzing the mathematical subject classification codes

Kwangju Choi, June-Yub Lee, Younjin Kim, Donghwan Lee

COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS (2020)

Article Automation & Control Systems

Multi-modal hybrid modeling strategy based on Gaussian Mixture Variational Autoencoder and spatial-temporal attention: Application to industrial process prediction

Haifei Peng, Jian Long, Cheng Huang, Shibo Wei, Zhencheng Ye

Summary: This paper proposes a novel multi-modal hybrid modeling strategy (GMVAE-STA) that can effectively extract deep multi-modal representations and complex spatial and temporal relationships, and applies it to industrial process prediction.

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2024)