Article
Computer Science, Artificial Intelligence
Junghye Lee, In Young Choi, Chi-Hyuck Jun
Summary: Classification of microarray data is crucial for cancer diagnosis and prediction, but the high dimensionality could pose challenges.
EXPERT SYSTEMS WITH APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Gaoteng Yuan, Yi Zhai, Jiansong Tang, Xiaofeng Zhou
Summary: This paper proposes a feature selection algorithm based on cosine similarity coefficient and information measurement criterion (CSCIM_FS). The algorithm calculates the mutual information (MI) between features and tags, and sorts the features according to the calculated MI. It constructs a feature matrix to transform the one-dimensional feature sequence into a two-dimensional square matrix. The experimental results show that the CSCIM_FS algorithm selected a feature subset with high accuracy and outperforms other algorithms.
Article
Engineering, Electrical & Electronic
Mengmeng Li, Qibin Zheng, Yi Liu, Gengsong Li, Wei Qin, Xiaoguang Ren
Summary: This paper proposes an evolutionary algorithm-based classification method, HIMALO, for high-dimensional imbalanced multi-classification problems. It introduces a new individual initialization strategy and a multi-classification strategy, and experiments demonstrate its superior classification performance and stability.
ELECTRONICS LETTERS
(2023)
Article
Engineering, Electrical & Electronic
Mengmeng Li, Qibin Zheng, Yi Liu, Gengsong Li, Wei Qin, Xiaoguang Ren
Summary: This paper proposes an evolutionary algorithm-based classification method, named HIMALO, for high-dimensional imbalanced multi-classification problems. HIMALO achieves superior classification performance and stability by introducing a new individual initialization strategy and a multi-classification strategy that combines one versus all and one-against-higher-order.
ELECTRONICS LETTERS
(2023)
Article
Automation & Control Systems
Mohammad Ahmadi Ganjei, Reza Boostani
Summary: In this paper, a new hybrid feature selection approach that combines filter and wrapper methods is proposed. By ranking, clustering, and searching the features, this method achieves better performance on high-dimensional datasets.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2022)
Article
Engineering, Electrical & Electronic
Mengmeng Li, Yi Liu, Qibin Zheng, Gengsong Li, Wei Qin
Summary: This paper introduces a novel data imputation algorithm, PSOHM, which utilizes particle swarm optimization to impute both continuous and discrete features in high-dimensional mixed missing variables data. The algorithm outperforms traditional methods in terms of classification performance on various datasets.
ELECTRONICS LETTERS
(2023)
Article
Physics, Multidisciplinary
Michael C. Abbott, Benjamin B. Machta
Summary: Inference from limited data requires a measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. However, the well-known Jeffreys prior leads to significant bias in high-dimensional models because the effective dimensionality of models in science is usually smaller than the number of microscopic parameters. A principled choice of measure that focuses on relevant parameters can avoid this issue and lead to unbiased posteriors. This optimal prior depends on the quantity of data and approaches Jeffreys prior in the asymptotic limit, but justifying this limit requires an impractically large increase in data quantity for typical models.
Article
Automation & Control Systems
Abdul Wahid, Dost Muhammad Khan, Nadeem Iqbal, Hammad Tariq Janjuhah, Sajjad Ahmad Khan
Summary: Feature selection is crucial in high-dimensional regression and classification problems. This paper introduces a novel stability estimator to measure the internal and external stability of feature subsets chosen by different methods. Experimental results validate the usefulness of the proposed stability estimator.
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Rui Pan, Yingqiu Zhu, Baishan Guo, Xuening Zhu, Hansheng Wang
Summary: The emergence of massive data brings challenges to statistical inference. New sampling techniques are needed to sample data from a hard drive. In this paper, a sequential addressing subsampling (SAS) method is proposed that samples data directly from the hard drive. The SAS method is time saving compared to the random addressing subsampling (RAS) method, and its estimators are studied and tested through simulation studies and comparison with RAS method.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Computer Science, Artificial Intelligence
Harpreet Singh, Birmohan Singh, Manpreet Kaur
Summary: This study proposes an efficient feature selection and parameter optimization method for classifying high-dimensional biomedical datasets. By introducing the improved elephant herding optimization algorithm and data normalization techniques, the impact of noisy features can be reduced, and the optimal feature set can be obtained, thereby improving classification accuracy.
Article
Genetics & Heredity
Fei Zhou, Jie Ren, Yuwen Liu, Xiaoxi Li, Weiqun Wang, Cen Wu
Summary: We introduce interep, an R package for analyzing repeated measurement data with high-dimensional main and interaction effects. The package implements penalization methods based on generalized estimating equation (GEE), and provides alternative methods as well. This software article presents the statistical methodology, core and supporting functions usage, and a simulation example with R codes. The interep package is available at The Comprehensive R Archive Network (CRAN).
Article
Computer Science, Artificial Intelligence
Elnaz Pashaei, Elham Pashaei
Summary: Microarray analysis of gene expression is helpful for disease and cancer diagnosis and prognosis. This paper proposes a new gene selection strategy based on the binary COOT optimization algorithm, and compares it to other techniques. The experimental results show that the BCOOT-CSA approach outperforms other methods in terms of prediction accuracy and selected gene number.
NEURAL COMPUTING & APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Peipei Li, Haixiang Zhang, Xuegang Hu, Xindong Wu
Summary: Multi-label data streams, characterized by multiple labels, high dimensionality, high volume, high velocity, and concept drifts, have been popular on the Web. However, research attention to the challenging task of multi-label data stream classification with high-dimensional attributes and concept drifts has been limited. In this study, we propose an algorithm adaptation approach that integrates max-relevance and min-redundancy to effectively classify multi-label data streams. We refine the feature selection criteria and introduce a concept drifting detection approach, resulting in an incremental ensemble classification model with superior performance.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Computer Science, Artificial Intelligence
Hasna Chamlal, Tayeb Ouaderhman, Basma El Mourtji
Summary: This study presents an algorithm for heterogeneous variable selection in discrimination problems. The algorithm utilizes both filter and wrapper approaches, and introduces a new feature discrimination power measure. Experimental results demonstrate the superiority of this algorithm over other methods.
KNOWLEDGE-BASED SYSTEMS
(2023)
Article
Genetics & Heredity
Brody Kutt, Rachel Burdorf, Travaughn Bain, Nicardo Cameron, Alexia Pearah, Ersoy Subasi, David J. Carroll, Lisa K. Moore, Munevver Mine Subasi
Summary: This study utilized data from CCLE to analyze gene expression and copy number features in melanoma cell lines, identifying specific genes and combinations that can distinguish between cell lines. A feature selection approach for high-dimensional datasets was designed to identify a small subset of genes that can accurately classify melanoma cell lines, potentially leading to personalized treatment approaches.
FRONTIERS IN GENETICS
(2021)
Article
Mathematical & Computational Biology
Simon Klau, Marie-Laure Martin-Magniette, Anne-Laure Boulesteix, Sabine Hoffmann
BIOMETRICAL JOURNAL
(2020)
Review
Genetics & Heredity
Anne-Laure Boulesteix, Marvin N. Wright, Sabine Hoffmann, Inke R. Koenig
Article
Health Care Sciences & Services
Alexander Volkmann, Riccardo De Bin, Willi Sauerbrei, Anne-Laure Boulesteix
BMC MEDICAL RESEARCH METHODOLOGY
(2019)
Review
Biochemical Research Methods
Riccardo De Bin, Anne-Laure Boulesteix, Axel Benner, Natalia Becker, Willi Sauerbrei
BRIEFINGS IN BIOINFORMATICS
(2020)
Article
Obstetrics & Gynecology
D. -M Burgmann, K. Foerster, M. Klemme, M. Delius, C. Huebener, R. Wiskott, A. L. Boulesteix, A. W. Flemmer
Summary: This study evaluated the frequency, duration, and severity of desaturations and bradycardia in the first hours of life in term neonates. The results showed that approximately 30% of infants experienced desaturations, with 25% of them being prolonged desaturations. Infants born by planned Cesarean section had a significantly higher occurrence of desaturations compared to other modes of delivery.
JOURNAL OF MATERNAL-FETAL & NEONATAL MEDICINE
(2022)
Article
Oncology
Daniel Samaga, Roman Hornung, Herbert Braselmann, Julia Hess, Horst Zitzelsberger, Claus Belka, Anne-Laure Boulesteix, Kristian Unger
RADIATION ONCOLOGY
(2020)
Article
Mathematics, Interdisciplinary Applications
Nicole Ellenbach, Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger, Roman Hornung
Summary: The paper discusses the reasons why prediction rules trained on high-dimensional data do not generalize well across different sources, introduces a new method for tuning parameter selection, and concludes through a large-scale comparison study that tuning on external data and robust tuning with a tuned robustness parameter lead to better generalizing prediction rules.
JOURNAL OF CLASSIFICATION
(2021)
Article
Statistics & Probability
Mathias Fuchs, Roman Hornung, Anne-Laure Boulesteix, Riccardo De Bin
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
(2020)
Article
Statistics & Probability
Cornelia Fuetterer, Thomas Augustin, Christiane Fuchs
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
(2020)
Article
Medicine, General & Internal
Anne-Laure Boulesteix, Rolf H. H. Groenwold, Michal Abrahamowicz, Harald Binder, Matthias Briel, Roman Hornung, Tim P. Morris, Jorg Rahnenfuhrer, Willi Sauerbrei
Article
Multidisciplinary Sciences
Sabine Hoffmann, Felix Schoenbrodt, Ralf Elsas, Rory Wilson, Ulrich Strasser, Anne-Laure Boulesteix
Summary: This paper presents a general framework on sources of uncertainty in computational analyses that lead to multiplicity of analysis strategies, and applies it to various approaches proposed in different disciplines to address this issue.
ROYAL SOCIETY OPEN SCIENCE
(2021)
Article
Computer Science, Artificial Intelligence
C. Jansen, H. Blocher, T. Augustin, G. Schollmeyer
Summary: This paper proposes efficient methods for eliciting complex preferences and applies them to decision making problems. The methods enable decision makers to reveal their preference system through as few ranking questions as possible. The study presents two approaches, one utilizing ranking data to obtain ordinal preferences and the other explicitly eliciting an approximate version of the cardinal preferences. Conditions for obtaining the decision maker's true preference system and improving efficiency are discussed.
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
(2022)
Article
Biotechnology & Applied Microbiology
Stefan Buchka, Alexander Hapfelmeier, Paul P. Gardner, Rory Wilson, Anne-Laure Boulesteix
Summary: Many research articles claim that new data analysis methods outperform existing ones, but the veracity of such claims is questionable. This manuscript discusses the consequences of optimistic bias in evaluating novel data analysis methods, and quantitatively investigates this bias using an example from epigenetic analysis.
Article
Public, Environmental & Occupational Health
Simon Klau, Sabine Hoffmann, Chirag J. Patel, John P. A. Ioannidis, Anne-Laure Boulesteix
Summary: The study highlights the significant impact of sampling, model, and measurement uncertainty on the stability of observational associations, potentially leading to large variability in results. Measurement error in observational studies can attenuate the true effect in most cases, but may also occasionally result in overestimation.
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
(2021)
Article
Cardiac & Cardiovascular Systems
Korbinian Lackermair, Stefan Brunner, Mathias Orban, Sven Peterss, Martin Orban, Hans D. Theiss, Bruno C. Huber, Gerd Juchem, Frank Born, Anne-Laure Boulesteix, Axel Bauer, Maximilian Pichlmaier, Joerg Hausleiter, Steffen Massberg, Christian Hagl, Sabina P. W. Guenther
Summary: This pilot study showed that randomized studies with ECLS in CS patients are feasible and safe. Small numbers of included patients impede meaningful conclusions about mortality and neurological outcome. Our findings of numerical differences in mortality and survival with severe neurological impairment give an urgent call for larger multi-centric randomized trials assessing the endpoint of all-cause mortality but also considering the effects on neurological outcome measures.
CLINICAL RESEARCH IN CARDIOLOGY
(2021)