Article
Multidisciplinary Sciences
Arley Camargo
Summary: Principal Component Analysis (PCA) is a widely used statistical method for ordination and dimensionality reduction of multivariate datasets. In this article, the author introduces the importance of PCA and presents the PCAtest package, which implements permutation-based statistical tests to evaluate the significance of PCA and the contributions of variables to the PC axes. The author encourages R users to routinely apply PCAtest for testing the significance of their PCA before interpreting PC axes and utilizing PC scores in subsequent analyses.
Article
Clinical Neurology
Najib E. El Tecle, Jorge F. Urquiaga, Samuel T. Griffin, Georgios Alexopoulos, Tarek Y. El Ahmadieh, Salah G. Aoun, Tobias A. Mattei
Summary: The study revealed that misinterpretations of null hypothesis significance testing results near the P-value threshold are present in at least 1% of neurosurgical literature. While most statistical errors may be unintentional, additional measures should be implemented to prevent the future adoption of such undesirable methodological practices among researchers.
WORLD NEUROSURGERY
(2022)
Article
Computer Science, Interdisciplinary Applications
Alan D. Hutson, Han Yu
Summary: In this paper, we extend the permutation test approach based on the Pearson correlation coefficient to ordinal measures of association, building upon the work of DiCiccio and Romano (2017). We investigate commonly used ordinal measures, such as the Spearman correlation, Kendall's tau-b, and gamma, and find that asymptotically correct tests perform well for moderate to large sample sizes. Our findings align with previous research, indicating that exact permutation tests based on ordinal measures of association are often not exact.
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
(2023)
Review
Computer Science, Artificial Intelligence
Iztok Fister Jr, Iztok Fister, Dusan Fister, Vili Podgorelec, Sancho Salcedo-Sanz
Summary: Association rule mining aims to search for relationships between attributes in transaction databases. The process involves pre-processing techniques, rule mining, and post-processing with visualization. This review paper provides a literature review and analysis of techniques, applications, and future research directions in association rule mining and visualization.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Editorial Material
Oncology
Mary E. Putt
Summary: The statistical significance of a risk factor is influenced by sample size and the distributions of outcome and predictor variables. Paying closer attention to confidence intervals and visual displays can lead to a more comprehensive understanding of data analysis results.
Article
Computer Science, Information Systems
Aashara Shrestha, Dimitrios Zikos, Leonidas Fegaras
Summary: This work aims to derive interesting clinical events using association rule mining based on a user-annotated order of clinical features. The plugin algorithm scans the database to calculate the support of item sequences in line with the user-annotated feature order. It generates rules efficiently and organizes them into meaningful hierarchies to unfold interesting clinical events.
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
(2021)
Article
Computer Science, Artificial Intelligence
S. Sharmila, S. Vijayarani
Summary: Association rule mining is a well-known data mining scheme used to discover commonly co-occurred itemsets, with frequent item recognition and association rule generation being key steps. Various algorithms have been developed by researchers to generate association rules, with fuzzy logic incorporated for uncovering recurrent itemsets and interesting fuzzy association rules. Dimensionality reduction techniques are proposed to effectively identify significant transactions and items from databases, while the efficiency of the proposed algorithm is compared with other optimization techniques for frequent item identification and rule generation.
Article
Automation & Control Systems
Sunita M. Dol, Pradip M. Jawandhiya
Summary: Educational data mining (EDM) applies data mining techniques in the field of education to classify, analyze, and predict students' academic performance, dropout rate, and instructors' performance. This review article analyzes 142 research articles from 2010 to 2020 and discusses the current developments in EDM in 2021 and 2022. It presents the use of classification techniques, clustering algorithms, association rule algorithms, regression techniques, and ensemble techniques in EDM. The article also compares different classification techniques and identifies research gaps for future improvement in the teaching-learning process.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2023)
Article
Medicine, General & Internal
Pranay R. Manda, Manish Kuchakulla, Gabrielle Hochu, Pranav Mudiam, Arjun Watane, Ali Syed, Armin Ghomeshi, Ranjith Ramasamy
Summary: This study evaluated abstracts from 15 urology journals published between 2000 and 2021 and found a common statistical mistake of misconstruing non-significant data as trending toward significance. The word "trend" was used 572 times to describe such non-statistically significant data. There was a statistically significant difference in the error rates between different journals, and there was a moderate correlation between the number of articles published and the frequency of misuses of the word "trend".
CUREUS JOURNAL OF MEDICAL SCIENCE
(2023)
Article
Engineering, Industrial
He Lan, Xiaoxue Ma, Laihao Ma, Weiliang Qiao
Summary: Total loss of a ship is the most serious consequence of maritime accidents, causing massive property losses, human casualties, and environmental pollution. This study investigates significant patterns in total loss accidents using association rule technique and finds that ship age and accident type are key indicators.
RELIABILITY ENGINEERING & SYSTEM SAFETY
(2023)
Article
Green & Sustainable Science & Technology
Yuyao Guo, Lei Wang, Zelin Zhang, Jianhua Cao, Xuhui Xia
Summary: Due to the inability to restore the original performance, retired mechanical products are often replaced and discarded or recycled, resulting in energy waste and decreased residual value. The generalized growth remanufacturing model (GGRM) offers a solution to enhance residual value by incorporating a wider range of growth modes. However, suitable methods for selecting growth modes in GGRM are limited. Therefore, we propose a growth mode selection method based on association rule mining and conduct a case study to demonstrate its feasibility, efficiency, and accuracy.
Article
Computer Science, Artificial Intelligence
Xiangyu Liu, Xinzheng Niu, Philippe Fournier-Viger
Summary: This study introduces a new algorithm called FTARM, which efficiently finds the top-k association rules using Rule Generation Property Pruning and a novel candidate pruning property, leading to significant reductions in association rule mining time and memory usage. FTARM exhibits good scalability and can benefit various applications.
APPLIED INTELLIGENCE
(2021)
Article
Meteorology & Atmospheric Sciences
Radan Huth, Martin Dubrovsky
Summary: This study focuses on the statistical significance of trends in climate elements defined at a regional scale, comparing different detection methods. The sign test and extended Mann-Kendall test perform slightly better under low autocorrelation conditions, while all tests show similar performance under high autocorrelation conditions.
JOURNAL OF CLIMATE
(2021)
Article
Engineering, Environmental
Vishal Singh, Vishal Mishra
Summary: Association rule mining was used in this study to identify specific conditions for enhancing microalgae growth in wastewater, including CO2 content, light intensity, initial inoculum level, and N/P ratio. The general rules derived from this mining process showed that optimizing these parameters can increase biomass productivity and nutrient removal efficiency. These findings are important for future experimental design and large-scale implementation of microalgae-based wastewater treatment process.
JOURNAL OF ENVIRONMENTAL CHEMICAL ENGINEERING
(2022)
Article
Computer Science, Artificial Intelligence
Maidi Liu, Zhiwei Yang, Yong Guo, Jiang Jiang, Kewei Yang
Summary: Association rule mining (ARM) is an important research topic in data mining and knowledge discovery. This paper proposes a nonlinear ARM method called MICAR based on the maximal information coefficient (MIC), which can effectively extract high-quality positive and negative association rules, especially nonlinear association rules.
KNOWLEDGE AND INFORMATION SYSTEMS
(2022)
Article
Biochemistry & Molecular Biology
Xingjie Shi, Xiaoran Chai, Yi Yang, Qing Cheng, Yuling Jiao, Haoyue Chen, Jian Huang, Can Yang, Jin Liu
NUCLEIC ACIDS RESEARCH
(2020)
Article
Genetics & Heredity
Boran Gao, Can Yang, Jin Liu, Xiang Zhou
Summary: The new computational method GECKO improves the accuracy of estimating genetic and environmental covariances in GWAS, revealing shared genetic and environmental structures between traits and aiding in the investigation of causal relationships. Compared to traditional methods, GECKO provides more accurate estimates and identifies significant genetic and environmental covariances, demonstrating a twofold power gain in analyzing trait pairs.
Article
Biochemical Research Methods
Jiashun Xiao, Mingxuan Cai, Xianghong Hu, Xiang Wan, Gang Chen, Can Yang
Summary: This article presents a cross-population and cross-phenotype method for constructing accurate polygenic risk scores (PRSs) in under-represented populations. By leveraging datasets from European populations and genetically correlated phenotypes, this method improves the accuracy of PRSs in non-European populations and enhances disease prediction and prevention in personalized medicine.
Article
Biochemical Research Methods
Yan Liu, Hao Liang, Quan Zou, Zengyou He
Summary: The identification of essential proteins is an important problem in bioinformatics. Existing methods have limitations in providing context-free and easily interpretable quantifications of centrality values, specifying proper thresholds, and controlling the quality of reported essential proteins. To overcome these limitations, this study formulates the essential protein discovery problem as a multiple hypothesis testing problem and presents a significance-based method named SigEP. Experimental results demonstrate that SigEP outperforms competing algorithms.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Yan Liu, Wenfang Chen, Zengyou He
Summary: The study introduces a new significance-based essential protein recognition method named EPCS, which outperforms current state-of-the-art essential protein identification methods and the only significance-based method SigEP.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2021)
Article
Multidisciplinary Sciences
Zengyou He, Wenfang Chen, Xiaoqi Wei, Yan Liu
Summary: Community detection is a fundamental procedure in analyzing network data and the definition of a community remains a topic of debate. This study presents a new formulation for testing the realness of communities in weighted networks by modeling edge-weights as censored observations. By conducting Logrank tests on internal and external weight sets, the method outperforms existing evaluation metrics in individual community validation.
SCIENTIFIC REPORTS
(2021)
Article
Mathematical & Computational Biology
Jingsi Ming, Jia Zhao, Can Yang
Summary: The technique of single-cell RNA-sequencing has allowed researchers to explore the cellular heterogeneity of complex tissues. In this study, a scalable framework called scPI was proposed to analyze scRNA-seq data. The scPI framework utilizes amortized variational inference and a nonlinear neural network to infer the low-dimensional representations of the data. Through analysis of real datasets, it was demonstrated that scPI can effectively handle various probabilistic models for scRNA-seq data in terms of scalability, missing value imputation, and cell type clustering.
STATISTICS IN BIOSCIENCES
(2023)
Article
Multidisciplinary Sciences
Xianghong Hu, Jia Zhao, Zhixiang Lin, Yang Wang, Heng Peng, Hongyu Zhao, Xiang Wan, Can Yang
Summary: Mendelian randomization (MR) is a valuable tool for inferring causal relationships among traits using summary statistics from GWASs, but existing methods often rely on strong assumptions leading to false-positive findings. Research has shown that considering pleiotropy and sample structure is crucial for reducing confounding effects.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2022)
Letter
Biochemistry & Molecular Biology
Yiming Chao, Yang Xiang, Jiashun Xiao, Weizhong Zheng, Mo R. Ebrahimkhani, Can Yang, Angela Ruohao Wu, Pentao Liu, Yuanhua Huang, Ryohichi Sugimura
SIGNAL TRANSDUCTION AND TARGETED THERAPY
(2023)
Article
Biochemical Research Methods
Xinyi Yu, Jiashun Xiao, Mingxuan Cai, Yuling Jiao, Xiang Wan, Jin Liu, Can Yang
Summary: The findings from genome-wide association studies have greatly helped us understand the genetic basis of human complex traits and diseases. However, several major challenges still need to be addressed, including the unknown biological functions of most GWAS hits and the identification of genetic risk variants with weak effects. To overcome these challenges, we propose a powerful and adaptive latent model (PALM) that integrates functional annotations with GWAS summary statistics.
Article
Biochemical Research Methods
Chen Li, Ting-Fung Chan, Can Yang, Zhixiang Lin
Summary: The study introduces a method called stVAE, based on the variational autoencoder framework, to deconvolve the cell-type composition of cellular resolution spatial transcriptomic datasets. It accurately identifies spatial patterns of cell types and their relative proportions across spots.
Article
Materials Science, Multidisciplinary
Hongzhao Fan, Can Yang, Yanguang Zhou
Summary: Metal-organic frameworks (MOFs) have shown potential in energy storage and thermal management. By studying HKUST-1, a typical MOF, we found that its thermal conductivity is strongly size dependent, but decreases when water molecules are adsorbed. We also discovered two thermal energy exchange pathways in HKUST-1 with water molecules, and the thermal conductivity varies with the quantity of adsorbates due to the competition between these pathways.
Article
Computer Science, Artificial Intelligence
Zengyou He, Hao Liang, Zheng Chen, Can Zhao, Yan Liu
Summary: Community detection is a key data analysis problem, and many algorithms have been proposed. However, most work does not consider statistical significance. This article presents a tight upper bound on the p-value of a single community and a local search method for detecting statistically significant communities. Experimental results show its comparability with other methods.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2022)
Article
Computer Science, Artificial Intelligence
Yan Liu, Xiaoqi Wei, Wenfang Chen, Lianyu Hu, Zengyou He
Summary: This method utilizes a breadth-first search tree to generate a curve for calculating the influence score of nodes, demonstrating superiority over widely used centrality measures in various network domains.
Article
Computer Science, Information Systems
Zengyou He, Chaohua Sheng, Yan Liu, Quan Zou
Summary: This paper presents a generic framework that formulates the binary classification problem as a two-sample testing problem, which is based on instances and hypothesis testing. Experimental results show that the method achieves performance comparable to classic classifiers and outperforms existing testing-based classifiers.
Article
Computer Science, Information Systems
Xia Liang, Jie Guo, Peide Liu
Summary: This paper investigates a novel consensus model based on social networks to manage manipulative and overconfident behaviors in large-scale group decision-making. By proposing a novel clustering model and improved methods, the consensus reaching is effectively facilitated. The feedback mechanism and management approach are employed to handle decision makers' behaviors. Simulation experiments and comparative analysis demonstrate the effectiveness of the model.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Xiang Li, Haiwang Guo, Xinyang Deng, Wen Jiang
Summary: This paper proposes a method based on class gradient networks for generating high-quality adversarial samples. By introducing a high-level class gradient matrix and combining classification loss and perturbation loss, the method demonstrates superiority in the transferability of adversarial samples on targeted attacks.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Lingyun Lu, Bang Wang, Zizhuo Zhang, Shenghao Liu
Summary: Many recommendation algorithms only rely on implicit feedbacks due to privacy concerns. However, the encoding of interaction types is often ignored. This paper proposes a relation-aware neural model that classifies implicit feedbacks by encoding edges, thereby enhancing recommendation performance.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Jaehong Yu, Hyungrok Do
Summary: This study discusses unsupervised anomaly detection using one-class classification, which determines whether a new instance belongs to the target class by constructing a decision boundary. The proposed method uses a proximity-based density description and a regularized reconstruction algorithm to overcome the limitations of existing one-class classification methods. Experimental results demonstrate the superior performance of the proposed algorithm.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Hui Tu, Shifei Ding, Xiao Xu, Haiwei Hou, Chao Li, Ling Ding
Summary: Border-Peeling algorithm is a density-based clustering algorithm, but its complexity and issues on unbalanced datasets restrict its application. This paper proposes a non-iterative border-peeling clustering algorithm, which improves the clustering performance by distinguishing and associating core points and border points.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Long Tang, Pan Zhao, Zhigeng Pan, Xingxing Duan, Panos M. Pardalos
Summary: In this work, a two-stage denoising framework (TSDF) is proposed for zero-shot learning (ZSL) to address the issue of noisy labels. The framework includes a tailored loss function to remove suspected noisy-label instances and a ramp-style loss function to reduce the negative impact of remaining noisy labels. In addition, a dynamic screening strategy (DSS) is developed to efficiently handle the nonconvexity of the ramp-style loss.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Raghunathan Krishankumar, Sundararajan Dhruva, Kattur S. Ravichandran, Samarjit Kar
Summary: Health 4.0 is gaining global attention for better healthcare through digital technologies. This study proposes a new decision-making framework for selecting viable blockchain service providers in the Internet of Medical Things (IoMT). The framework addresses the limitations in previous studies and demonstrates its applicability in the Indian healthcare sector. The results show the top ranking BSPs, the importance of various criteria, and the effectiveness of the developed model.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Tao Tan, Hong Xie, Liang Feng
Summary: This paper proposes a heterogeneous update idea and designs HetUp Q-learning algorithm to enlarge the normalized gap by overestimating the Q-value corresponding to the optimal action and underestimating the Q-value corresponding to the other actions. To address the limitation, a softmax strategy is applied to estimate the optimal action, resulting in HetUpSoft Q-learning and HetUpSoft DQN. Extensive experimental results show significant improvements over SOTA baselines.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Chao Yang, Xianzhi Wang, Lina Yao, Guodong Long, Guandong Xu
Summary: This paper proposes a dynamic transformer-based architecture called Dyformer for multivariate time series classification. Dyformer captures multi-scale features through hierarchical pooling and adaptive learning strategies, and improves model performance by introducing feature-map-wise attention mechanisms and a joint loss function.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Xiguang Li, Baolu Feng, Yunhe Sun, Ammar Hawbani, Saeed Hammod Alsamhi, Liang Zhao
Summary: This paper proposes an enhanced scatter search strategy, using opposition-based learning, to solve the problem of automated test case generation based on path coverage (ATCG-PC). The proposed ESSENT algorithm selects the path with the lowest path entropy among the uncovered paths as the target path and generates new test cases to cover the target path by modifying the dimensions of existing test cases. Experimental results show that the ESSENT algorithm outperforms other state-of-the-art algorithms, achieving maximum path coverage with fewer test cases.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Shirin Dabbaghi Varnosfaderani, Piotr Kasprzak, Aytaj Badirova, Ralph Krimmel, Christof Pohl, Ramin Yahyapour
Summary: Linking digital accounts belonging to the same user is crucial for security, user satisfaction, and next-generation service development. However, research on account linkage is mainly focused on social networks, and there is a lack of studies in other domains. To address this, we propose SmartSSO, a framework that automates the account linkage process by analyzing user routines and behavior during login processes. Our experiments on a large dataset show that SmartSSO achieves over 98% accuracy in hit-precision.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Renchao Wu, Jianjun He, Xin Li, Zuguo Chen
Summary: This paper proposes a memetic algorithm with fuzzy-based population control (MA-FPC) to solve the joint order batching and picker routing problem (JOBPRP). The algorithm incorporates batch exchange crossover and a two-level local improvement procedure. Experimental results show that MA-FPC outperforms existing algorithms in terms of solution quality.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Guoxiang Zhong, Fagui Liu, Jun Jiang, Bin Wang, C. L. Philip Chen
Summary: In this study, we propose the AMFormer framework to address the problem of mixed normal and anomaly samples in deep unsupervised time-series anomaly detection. By refining the one-class representation and introducing the masked operation mechanism and cost sensitive learning theory, our approach significantly improves anomaly detection performance.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Jin Zhou, Kang Zhou, Gexiang Zhang, Ferrante Neri, Wangyang Shen, Weiping Jin
Summary: In this paper, the authors focus on the issue of multi-objective optimisation problems with redundant variables and indefinite objective functions (MOPRVIF) in practical problem-solving. They propose a dual data-driven method for solving this problem, which consists of eliminating redundant variables, constructing objective functions, selecting evolution operators, and using a multi-objective evolutionary algorithm. The experiments conducted on two different problem domains demonstrate the effectiveness, practicality, and scalability of the proposed method.
INFORMATION SCIENCES
(2024)
Article
Computer Science, Information Systems
Georgios Charizanos, Haydar Demirhan, Duygu Icen
Summary: This article proposes a new fuzzy logistic regression framework that addresses the problems of separation and imbalance while maintaining the interpretability of classical logistic regression. By fuzzifying binary variables and classifying subjects based on a fuzzy threshold, the framework demonstrates superior performance on imbalanced datasets.
INFORMATION SCIENCES
(2024)