Article
Computer Science, Artificial Intelligence
Vo Nguyen Le Duy, Takuto Sakuma, Taiju Ishiyama, Hiroki Toda, Kazuya Arai, Masayuki Karasuyama, Yuta Okubo, Masayuki Sunaga, Hiroyuki Hanada, Yasuo Tabei, Ichiro Takeuchi
Summary: This study proposes a novel statistical approach, called Stat-DSM, to evaluate the statistical significance of discriminative sub-trajectory mining results. The proposed method properly controls the statistical significance of the extracted sub-trajectories and addresses the computational and statistical challenges of massive trajectory datasets.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2022)
Article
Computer Science, Artificial Intelligence
Razieh Davashi
Summary: In this paper, a fast method called ITUFP is proposed for interactive mining of Top-K UFPs. The method efficiently stores and extracts pattern information by creating UP-Lists and IMCUP-Lists, and only updates the IMCUP-Lists when the K value changes. Experimental results demonstrate that the proposed method is very efficient for interactive mining of Top-K UFPs.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Xin Wang, Zhuo Lan, Yu-Ang He, Yang Wang, Zhi-Gui Liu, Wen-Bo Xie
Summary: This article introduces a cost-effective approach for frequent pattern mining on large graphs. The approach applies a level-wise strategy to incrementally detect frequent patterns and can terminate the mining process once the top-k patterns are discovered. It also utilizes a smart traverse strategy and compact data structures to compute the lower bound of support.
EXPERT SYSTEMS WITH APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Ham Nguyen, Tuong Le
Summary: This study presents a robust method for mining top-rank-k erasable closed patterns (ECPs) and combines the mining and ranking phases into a single step to improve efficiency. Experimental results confirm that this method outperforms other approaches in mining top-rank-k ECPs.
CMC-COMPUTERS MATERIALS & CONTINUA
(2022)
Article
Computer Science, Information Systems
Chunkai Zhang, Zilin Du, Wensheng Gan, Philip S. Yu
Summary: High-utility sequential pattern mining (HUSPM) has attracted significant research interest recently, with the main task of finding subsequences with high utility in a quantitative sequential database. The top-k HUSPM concept was introduced to address the challenge of specifying a minimum utility threshold. Existing strategies for top-k HUSPM require improvement in terms of efficiency and scalability.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
Maryam Abuissa, Alexander Lee, Matteo Riondato
Summary: We introduce a new class of null models for statistical validation of binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA), in contrast to previous Row-Order Enforcing (ROE) models. We propose the ROhAN algorithmic framework for efficient sampling of datasets from ROA models, and our experimental evaluation demonstrates the differences between ROA and ROE models, as well as the efficiency and scalability of ROhAN.
DATA MINING AND KNOWLEDGE DISCOVERY
(2023)
Article
Computer Science, Information Systems
Mohamed Ashraf, Tamer Abdelkader, Sherine Rady, Tarek F. Gharib
Summary: In this paper, a TKN method is proposed to efficiently mine Top-K HUIs with positive or negative profits. This method utilizes generalized and adaptive techniques to decrease the dataset traversing cost and narrow the exploration space through pruning and threshold elevating. Experimental results demonstrate the superiority of TKN in finding the required number of patterns compared to other competing algorithms.
INFORMATION SCIENCES
(2022)
Article
Computer Science, Information Systems
Palla Likhitha, Penugonda Ravikumar, Deepika Saxena, Rage Uday Kiran, Yutaka Watanobe
Summary: Finding periodic-frequent patterns in temporal databases is a significant data mining problem. This paper proposes a solution to discover the top-k periodic-frequent patterns in a database.
Article
Chemistry, Multidisciplinary
Kai Cao, Yucong Duan
Summary: High-utility sequential pattern mining (HUSPM) is a method used to find high-utility subsequences in a quantitative sequential database. However, existing extensions of high-utility sequential patterns (HUSP) have high utility that increases with their length, making it difficult to obtain diverse resource patterns. To address this issue, we propose a top-k high average-utility sequential pattern mining (HAUSPM) algorithm based on average utility, which improves efficiency and thresholds through a projection mechanism and a sequence average-utility-raising strategy. Experimental results demonstrate that the proposed algorithm achieves good performance.
APPLIED SCIENCES-BASEL
(2023)
Article
Computer Science, Information Systems
Longlong Lin, Pingpeng Yuan, Ronghua Li, Hai Jin
Summary: This paper investigates the problem of finding diversified lasting cohesive subgraphs from temporal networks and proposes a new model and solution. Empirical studies demonstrate that the proposed solutions perform efficiently and accurately, surpassing existing methods.
IEEE TRANSACTIONS ON BIG DATA
(2022)
Article
Nursing
Courtney Keeler, Alexa Colgrove Curtis
Summary: This article is part of a series that aims to provide nurses with a comprehensive understanding of the concepts and principles essential to clinical research. It covers a wide range of topics from research design to data interpretation. To access all articles in the series, visit the provided link.
AMERICAN JOURNAL OF NURSING
(2023)
Article
Meteorology & Atmospheric Sciences
D. James Fulton, Gabriele C. Hegerl
Summary: This study develops a Monte Carlo method to compare PCA, DMD, and SFA in extracting additive space-time modes present in climate data, showing that the alternative methods outperform PCA significantly in synthetic data and that PCA's extracted modes are not significantly better than random guesses in simple cases.
JOURNAL OF CLIMATE
(2021)
Editorial Material
Obstetrics & Gynecology
Philip M. Sedgwick, Anne Hammer, Ulrik Schioler Kesmodel, Lars Henning Pedersen
Summary: Traditional null hypothesis significance testing (NHST) is widely used in obstetric and gynecological research, but its application in inferring clinical significance is controversial. Misinterpretation of statistical significance and ignorance of NHST limitations may lead to false claims and dismissal of important factors.
ACTA OBSTETRICIA ET GYNECOLOGICA SCANDINAVICA
(2022)
Article
Computer Science, Information Systems
Judith Santos-Pereira, Le Gruenwald, Jorge Bernardino
Summary: This paper presents a survey of popular open-source data mining tools and proposes tool selection criteria based on healthcare application requirements. KNIME and RapidMiner are identified as the best tools for healthcare data mining.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Yusuke Kawamoto, Tetsuya Sato, Kohei Suenaga
Summary: This paper proposes a new approach for formally describing the requirement for statistical inference and checking the appropriate use of statistical methods in programs. The authors define a belief Hoare logic (BHL) for formalizing and reasoning about statistical beliefs acquired through hypothesis testing. Examples demonstrate the usefulness of BHL in reasoning about practical issues in hypothesis testing, while also discussing the importance of prior beliefs in acquiring statistical beliefs.
ARTIFICIAL INTELLIGENCE
(2024)
Article
Biochemical Research Methods
Diego Santoro, Leonardo Pellegrina, Matteo Comin, Fabio Vandin
Summary: SPRISS is an efficient algorithm for approximating frequent k-mers and their frequencies in next-generation sequencing data. It uses a simple yet powerful reads sampling scheme to obtain comparable results in a shorter amount of time. Experimental results demonstrate its efficiency and accuracy.
Article
Computer Science, Information Systems
Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato
Summary: This paper presents MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for functions with poset structure. MCRapper allows finding statistically-significant functions and approximations of high-expectation functions. It achieves this by using upper bounds to efficiently explore and prune the search space. The paper also introduces TFP-R, an algorithm developed using MCRapper for True Frequent Pattern mining, which outperforms existing methods.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
(2022)
Article
Biochemical Research Methods
Leonardo Pellegrina, Fabio Vandin
Summary: The study presents a new algorithm, MASTRO, for discovering significantly conserved evolutionary trajectories in cancer. The algorithm is applied to lung cancer and acute myeloid leukemia data, confirming and extending previous findings.
Article
Biochemical Research Methods
Leonardo Pellegrina, Cinzia Pizzi, Fabio Vandin
JOURNAL OF COMPUTATIONAL BIOLOGY
(2020)
Proceedings Paper
Computer Science, Information Systems
Leonardo Pellegrina, Matteo Riondato, Fabio Vandin
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING
(2019)
Proceedings Paper
Computer Science, Information Systems
Leonardo Pellegrina, Matteo Riondato, Fabio Vandin
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Leonardo Pellegrina, Fabio Vandin
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING
(2018)
Proceedings Paper
Engineering, Aerospace
Gilberto Grassi, Mattia Pezzato, Alessia Gloder, Riccardo Mantellato, Alessandro Francesconi, Enrico Lorenzini, Alvise Rossi, Leonardo Pellegrina
2017 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (METROAEROSPACE)
(2017)