Article
Biochemical Research Methods
Rola Houhou, Petra Rosch, Jurgen Popp, Thomas Bocklitz
Summary: In this study, Raman spectral data were analyzed using B-spline basis functions for approximation, followed by functional principal component analysis and linear discriminant analysis (FPCA-LDA) compared to classical PCA-LDA. Results showed that FPCA-LDA had higher mean sensitivities than PCA-LDA, especially with low signal-to-noise ratio and small peak shifts. However, both methods performed equally with higher signal-to-noise ratio, and a slight improvement was observed when FPCA-LDA was applied to experimental Raman data.
ANALYTICAL AND BIOANALYTICAL CHEMISTRY
(2021)
Article
Computer Science, Artificial Intelligence
Avishek Chatterjee, Satyaki Mazumder, Koel Das
Summary: In this paper, a classification framework using functional data and classwise Principal Component Analysis (PCA) is presented. The proposed method solves the small sample size problem commonly encountered in high dimensional time series data. It converts time series data into functional data and uses classwise functional PCA for feature extraction followed by classification using a Bayesian linear classifier. The effectiveness of the proposed method is demonstrated through its application to synthetic and real time series data from various fields including neuroscience, food science, medical science, and chemometrics.
DATA MINING AND KNOWLEDGE DISCOVERY
(2023)
Article
Biochemistry & Molecular Biology
Hiroyuki Yamamoto, Yasumune Nakayama, Hiroshi Tsugawa
Summary: Researchers have developed a method called orthogonal smoothed PCA for statistical hypothesis testing of metabolomics data to select significant metabolites. This method was successfully applied to two real datasets with promising results, indicating that OS-PCA combined with statistical hypothesis testing is a useful approach for metabolome data analysis.
Article
Computer Science, Theory & Methods
Felipe L. Gewers, Gustavo R. Ferreira, Henrique F. De Arruda, Filipi N. Silva, Cesar H. Comin, Diego R. Amancio, Luciano Da F. Costa
Summary: PCA is commonly used for data analysis in various fields, and this work presents theoretical and practical aspects of PCA in an accessible manner. The basic principles, data standardization, visualizations, and outlier detection of PCA are discussed, along with its potential for dimensionality reduction. The work also summarizes PCA-related approaches and aims to assist researchers from diverse areas in utilizing and interpreting PCA effectively.
ACM COMPUTING SURVEYS
(2021)
Article
Automation & Control Systems
Jicong Fan, Tommy W. S. Chow, S. Joe Qin
Summary: In this article, a nonlinear method is proposed to handle the missing data problem in industrial processes. The proposed method, called fast incremental nonlinear matrix completion (FINLMC), allows for missing data imputation in both offline modeling and online monitoring stages. The effectiveness of the method is supported by theoretical analysis and experiments, which demonstrate its ability to improve fault detection rate and reduce false alarms in nonlinear processing monitoring with missing data.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
(2022)
Article
Engineering, Civil
Honghua Liu, Jing Yang, Ming Ye, Scott C. James, Zhonghua Tang, Jie Dong, Tongju Xing
Summary: This study introduced t-SNE as a graphic approach to assist cluster analysis for groundwater geochemistry data. Compared to PCA, t-SNE performed better in assisting cluster analysis, showing promise as a tool for determining cluster numbers and delineating spatial zones.
JOURNAL OF HYDROLOGY
(2021)
Article
Computer Science, Interdisciplinary Applications
David Yevick
Summary: This paper examines various applications of principal component analysis (PCA) in physical systems, showing that PCA can identify conserved quantities in the form of polynomials in system variables. By identifying principal components with the smallest explained variances, combinations of conserved features can be obtained.
COMPUTER PHYSICS COMMUNICATIONS
(2021)
Article
Statistics & Probability
Piotr Kokoszka, Rafal Kulik
Summary: Principal Components Analysis is a widely used approach in multivariate analysis for dimension reduction or feature extraction. However, the behavior of sample covariance operator in the context of infinite variance multivariate or functional data is still unknown.
JOURNAL OF MULTIVARIATE ANALYSIS
(2023)
Article
Computer Science, Interdisciplinary Applications
Yeonjoo Park, Hyunsung Kim, Yaeji Lim
Summary: This paper presents robust principal component estimators for partially observed functional data with heavy-tail behaviors, where sample trajectories are collected over individual-specific subintervals. The method treats partially sampled trajectories as the elliptical process filtered by the missing indicator process and applies robust functional principal component analysis under this framework. The proposed method is computationally efficient and straightforward, estimating the robust correlation function through pair-wise covariance computation combined with M-estimation. The estimators are asymptotically consistent under general conditions. Simulation studies demonstrate the superior performance of the method in approximating the subspace of data and reconstructing full trajectories. The proposed method is then applied to hourly monitored air pollutant data containing anomaly trajectories with random missing segments.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2023)
Article
Chemistry, Physical
Marc Duquesnoy, Teo Lombardo, Fernando Caro, Florent Haudiquez, Alain C. Ngandjong, Jiahui Xu, Hassan Oularbi, Alejandro A. Franco
Summary: This study proposes a functional data-driven framework to capture the influence of manufacturing parameters on the properties of lithium-ion battery composite electrodes, while ensuring a match with experimental data. The results demonstrate that this approach can significantly improve computational efficiency without sacrificing accuracy.
NPJ COMPUTATIONAL MATERIALS
(2022)
Article
Mathematical & Computational Biology
J. U. N. ZHANG, G. R. E. G. J. SIEGLE, T. A. O. SUN, W. E. N. D. Y. D'ANDREA, R. O. B. E. R. T. T. KRAFTY
Summary: This article introduces a novel approach to conducting interpretable principal components analysis on multilevel multivariate functional data that provides interpretable components that can be both sparse among variates and have localized support over time.
Article
Statistics & Probability
William Consagra, Arun Venkataraman, Xing Qiu
Summary: In this article, a computational framework for learning continuous representations from multidimensional functional data is proposed. The framework utilizes separable basis functions, tensor decomposition, and roughness-based regularization to construct representations and solve the estimation problem. The advantages of the proposed method are demonstrated through simulations and a real data application.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Computer Science, Theory & Methods
Rou Zhong, Shishi Liu, Haocheng Li, Jingxiao Zhang
Summary: In this paper, we propose a sparse logistic functional principal component analysis (SLFPCA) method to handle functional binary data. The SLFPCA looks for local sparsity of the eigenfunctions to obtain convenience in interpretation. The proposed method is accompanied by R package SLFPCA for implementation. The theoretical results indicate both consistency and sparsistency of the proposed method. We conduct a thorough numerical experiment to demonstrate the advantages of the SLFPCA approach. Our method is further applied to a physical activity dataset.
STATISTICS AND COMPUTING
(2023)
Article
Environmental Sciences
Moonil Kim, Park Chul, Wan Kim, Fenghao Cui
Summary: A parameter ranking system was developed using PCA and data smoothing to enhance the data interpretation of the anaerobic digestion process. This system helps identify core parameters for early detection of operating problems and addressing stabilization issues.
Article
Engineering, Multidisciplinary
Zhijiang Lou, Zedong Li, Youqing Wang, Shan Lu
Summary: This paper introduces an improved neural component analysis (INCA) method, which addresses the issue of NCA's inability to handle non-Gaussian features by proposing a new cost function based on kurtosis. It also improves the extraction of key information from process data by selecting principal components (PCs) in the original data space. Experimental results show that INCA outperforms other methods in fault detection.
Article
Genetics & Heredity
Chang Liu, Eric Kannisto, Guan Yu, Yunchen Yang, Mary E. Reid, Santosh K. Patnaik, Yun Wu
FRONTIERS IN GENETICS
(2020)
Article
Environmental Sciences
Leizhi Wang, Zhenduo Zhu, Lauren Sassoubre, Guan Yu, Chen Liao, Qingfang Hu, Yintang Wang
Summary: The study introduces an ensemble machine learning approach known as model stacking for predicting beach water quality reliably. In experiments conducted at three beaches along eastern Lake Erie, the stacking model consistently ranked 1st or 2nd in accuracy every year, with yearly-average accuracy of 78%, 81%, and 82.3% at the three studied beaches, respectively.
SCIENCE OF THE TOTAL ENVIRONMENT
(2021)
Article
Immunology
Jeremy Kiripolsky, Eileen M. Kasperek, Chengsong Zhu, Quan-Zhen Li, Jia Wang, Guan Yu, Jill M. Kramer
Summary: Myd88 activation plays a crucial role in specific cell types in the pathology of pSS. Deleting Myd88 in hematopoietic cells can mitigate inflammatory responses in salivary tissue and nephritis, but increases pulmonary inflammation; meanwhile, ablation of Myd88 in stromal cells can reduce pulmonary inflammation and alleviate levels of anti-nuclear autoantibodies.
JOURNAL OF AUTOIMMUNITY
(2021)
Article
Environmental Sciences
Xiaoyan Yan, Xushen Chen, Xiaolin Tian, Yulan Qiu, Jie Wang, Guan Yu, Nisha Dong, Jing Feng, Jiaxin Xie, Morgan Nalesnik, Ruiyan Niu, Bo Xiao, Guohua Song, Sarah Quinones, Xuefeng Ren
Summary: Co-exposure to inorganic arsenic and fluoride results in more prominent adverse effects on cardiovascular systems and perturbation of gut microbiota, with certain bacterial genera strongly correlated with higher risk of cardiovascular events.
SCIENCE OF THE TOTAL ENVIRONMENT
(2021)
Article
Statistics & Probability
Guan Yu, Haoda Fu, Yufeng Liu
Summary: The article explores a new high-dimensional cost-constrained linear regression problem and proposes a new optimization method, demonstrating the convergence of the algorithm and the possibility of a global optimal solution. Experimental results show that the method performs well in high-dimensional problems.
Article
Immunology
Jeremy Kiripolsky, Eileen M. Kasperek, Chengsong Zhu, Quan-Zhen Li, Jia Wang, Guan Yu, Jill M. Kramer
Summary: Primary Sjogren's syndrome, predominantly seen in women, is characterized by exocrine gland dysfunction and serious systemic manifestations. Research suggests that ECM degradation may represent a novel source of chronic B cell activation in the context of pSS.
FRONTIERS IN IMMUNOLOGY
(2021)
Article
Statistics & Probability
Jialu Li, Guan Yu, Qizhai Li, Yufeng Liu
Summary: Modern high-dimensional statistical inference often faces the problem of missing data. In this article, we propose a new method called SCOM to deal with missing data occurring in predictors. SCOM makes full use of all available data and is robust with respect to various missing mechanisms.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2023)
Article
Health Care Sciences & Services
Guan Yu, Surui Hou
Summary: This paper addresses the classification problem in multi-modality datasets with missing data by developing a new weighted nearest neighbors classifier called the integrative nearest neighbor (INN) classifier. INN efficiently utilizes available information in the training data and feature vector of the test data point to predict its class label without deleting or imputing any missing data. Simulation and real application results demonstrate that INN outperforms other methods in terms of classification performance.
STATISTICAL METHODS IN MEDICAL RESEARCH
(2022)
Article
Engineering, Environmental
Lingbo Li, Jundong Qiao, Guan Yu, Leizhi Wang, Hong-Yi Li, Chen Liao, Zhenduo Zhu
Summary: Tree-based machine learning models offer low-cost and timely solutions for predicting microbial fecal contamination in beach water. However, many of these models are difficult to interpret. This study evaluates five tree-based models and employs the SHAP explanation method to improve interpretability. LightGBM and XGBoost achieve the highest precision and recall scores in predicting Escherichia coli concentration in beach water.
Article
Hematology
Nagaraja Rao Sridhar, Ziqiang Chen, Guan Yu, Judy Lambert, Mary Muscarella, Madan Nanjundegowda, Mandip Panesar
Summary: Based on the findings of this study, adjusting the dialysate bicarbonate concentration based on pre-HD serum bicarbonate is unnecessary, while higher bicarbonate and lower dialysate sodium are associated with post-HD alkalemia.
THERAPEUTIC APHERESIS AND DIALYSIS
(2023)
Article
Environmental Sciences
Seth Frndak, Gabriel Barg, Elena Queirolo, Nelly Manay, Craig Colder, Guan Yu, Zia Ahmed, Katarzyna Kordas
Summary: Lead exposure and neighborhood characteristics have an impact on children's behavior. However, the study found that the effects of lead on behavior are modified by the distance to greenspace. This suggests that interventions should consider both greenspace access and lead exposure prevention.
Article
Statistics & Probability
Haiyang Sheng, Guan Yu
Summary: Traditional weighted nearest neighbors classifiers are designed for independent and identically distributed supervised learning problems, but in real applications, it is often difficult to obtain the desired data distribution. Therefore, we propose a novel transfer learning weighted nearest neighbors classifier that can flexibly combine training samples from different distributions to improve prediction accuracy.
JOURNAL OF MULTIVARIATE ANALYSIS
(2023)
Article
Chemistry, Multidisciplinary
Chang-Chieh Hsu, Yunchen Yang, Eric Kannisto, Xie Zeng, Guan Yu, Santosh K. Patnaik, Grace K. Dy, Mary E. Reid, Qiaoqiang Gan, Yun Wu
Summary: Tumor-derived exosomes (TEXs) are considered as promising biomarkers for cancer liquid biopsy. We developed an ExoPROS biosensor that selectively captures TEXs and enables simultaneous detection of TEX protein-microRNA pairs using a surface plasmon resonance mechanism. The assay demonstrated high accuracy for lung cancer and breast cancer diagnosis compared to conventional methods. The ExoPROS assay is a potent liquid biopsy assay for cancer diagnosis.
Article
Public, Environmental & Occupational Health
Seth Frndak, Guan Yu, Youssef Oulhote, Elena I. Queirolo, Gabriel Barg, Marie Vahter, Nelly Manay, Fabiana Peregalli, James R. Olson, Zia Ahmed, Katarzyna Kordas
Summary: This study presents a two-step approach for selecting exposures in high-dimensional environmental datasets, considering confounding. The approach involves using LASSO algorithm for feature selection and linear regression models with confounder adjustment based on directed acyclic graphs (DAGs) for statistical inference. The results identified four variables associated with cognitive ability scores.
INTERNATIONAL JOURNAL OF HYGIENE AND ENVIRONMENTAL HEALTH
(2023)
Article
Dentistry, Oral Surgery & Medicine
Thikriat Al-Jewair, Simran Marwah, Charles Brian Preston, Yufei Wu, Guan Yu
Summary: This study evaluated the correlation between craniofacial structures and nasopharyngeal dimensions in African Black adolescents, finding significant associations between craniofacial structures and stature/upper body height, maxillary growth and bony nasopharyngeal variables, as well as mandibular growth and soft tissue nasopharyngeal variables. Sexual dimorphism in lower facial height was identified, suggesting the need for further research on managing craniofacial complexity and nasopharyngeal airway in this population.
INTERNATIONAL ORTHODONTICS
(2021)