4.7 Article

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis

期刊

TALANTA
卷 116, 期 -, 页码 835-840

出版社

ELSEVIER
DOI: 10.1016/j.talanta.2013.07.048

关键词

Metabolomics; Chance correlations; Variable selection; Partial Least Squares-Discriminant Analysis (PLSDA)

资金

  1. Instituto Carlos III (Ministry of Economy and Competitiveness) [CD11/00154, CD12/00667, FISPI11/0313]
  2. University of Valencia
  3. Spanish Ministry of Science and Innovation (MICINN) [DPI2011-28112-C04-02]
  4. Spanish Ministry of Economy and Competitivity [SAF2012-39948]

向作者/读者索取更多资源

Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than that attributed to the presence of chance correlations in the original data set. Statistical significance of PLSDA CV-figures of merit obtained after variable selection is expressed by means of p-values calculated by using a permutation test that included the variable selection step. The reliability of the approach is evaluated using two variable selection methods on experimental and simulated data sets with and without induced class differences. The proposed approach can be considered as a useful tool when no external validation set is available and provides a straightforward way to evaluate differences between variable selection methods. (C) 2013 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据