Article
Computer Science, Information Systems
Chunkai Zhang, Zilin Du, Wensheng Gan, Philip S. Yu
Summary: High-utility sequential pattern mining (HUSPM) has attracted significant research interest recently, with the main task of finding subsequences with high utility in a quantitative sequential database. The top-k HUSPM concept was introduced to address the challenge of specifying a minimum utility threshold. Existing strategies for top-k HUSPM require improvement in terms of efficiency and scalability.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Artificial Intelligence
Md Ashraful Islam, Mahfuzur Rahman Rafi, Al-amin Azad, Jesan Ahammed Ovi
Summary: Data mining is the study of extracting useful information from massive amounts of data, with sequential pattern mining being a major branch. Weighted sequential pattern mining is more feasible in today's datasets due to items having different importance in real-life scenarios. This research introduces a new pruning technique and framework to generate a small number of candidate sequences faster without compromising completeness, significantly outperforming other existing approaches.
APPLIED INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Gengsen Huang, Wensheng Gan, Shan Huang, Jiahui Chen
Summary: The discovery of negative sequential patterns (NSPs) is crucial in data science, as it often provides more enlightening information than positive sequential patterns (PSPs). However, the task of discovering NSPs is more difficult and challenging due to computational complexity and a large search space. This paper proposes a novel algorithm called Negative Sequential Patterns with Individual Support (NSPIS) to solve this problem and achieve better efficiency.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Operations Research & Management Science
Rolf Fare, Valentin Zelenyuk
Summary: Sequential DEA is a new class of DEA modeling that allows for analyzing the efficiency of decision-making units consisting of a series of sub-DMUs. Embedded in the Hilbert sequence space, it can accommodate different numbers of sub-DMUs and inputs/outputs.
ANNALS OF OPERATIONS RESEARCH
(2021)
Article
Computer Science, Interdisciplinary Applications
Pei-Hsi Lee, Chau-Chen Torng, Chi-Hsuan Lin, Chao-Yu Chou
Summary: This study proposes a method that combines spectral clustering technique with support vector machine (SVM) for recognizing control chart patterns (CCPs) under a gamma distribution. Comparative studies show that this method outperforms other methods in terms of recognition efficiency for most CCP types.
COMPUTERS & INDUSTRIAL ENGINEERING
(2022)
Article
Computer Science, Information Systems
Ioannis Mavroudopoulos, Anastasios Gounaris
Summary: Sequential pattern analysis is a mature topic with various techniques for mining related problems. However, advanced techniques for efficiently detecting arbitrary sequences in large activity log collections are lacking. In this work, the SIESTA solution is introduced, which employs a novel architecture with inverted indices and advanced query processor, optimizes both preprocessing and querying phases, and achieves superior performance compared to state-of-the-art solutions for Big Data.
IEEE TRANSACTIONS ON BIG DATA
(2023)
Article
Education & Educational Research
Fred Zenker, Kristopher Kyle
Summary: The study reveals that MATTR and two versions of MTLD are the most stable indices in L2 argumentative essays, resisting the effects of text length and providing reliable results. Comparisons based on essay prompt and proficiency level also shed light on the characteristics of LD indices.
Article
Acoustics
Wei-Cheng Lin, Carlos Busso
Summary: In this study, a new framework is proposed to capture local emotional changes within a sentence by splitting it into chunks and generating chunk-level emotional patterns. The sentence-level speech emotion recognition (SER) model is trained with a sequence-to-sequence formulation using the retrieved emotional curves. The results show that this approach effectively captures emotional trends within a sentence and improves the accuracy of sentence-level predictions of emotional attributes.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Article
Green & Sustainable Science & Technology
Junghee Kim, Haemin Jung, Wooju Kim
Summary: This paper proposes a personalized alarm model to detect fraud in online banking transactions by analyzing sequence patterns in each user's normal transaction log. The model divides user logs into transactions, extracts sequence patterns, and uses them to determine if a new transaction is fraudulent.
Article
Construction & Building Technology
Chau Le, K. Joseph Shrestha, H. David Jeong, Ivan Damnjanovic
Summary: Determining a reliable and reasonable construction time for a project is a vital task for the project owner. The current practice of activity sequencing is challenging and heavily dependent upon agency schedulers' knowledge and experience. This study proposes a data-driven approach to enhance the current practice by leveraging past project data to develop a knowledge base of construction sequence patterns.
AUTOMATION IN CONSTRUCTION
(2021)
Article
Mathematics
Giuseppe Alessio D'Inverno, Sara Brunetti, Maria Lucia Sampoli, Dafin Fior Muresanu, Alessandra Rufa, Monica Bianchini
Summary: This study presents an algorithmic approach to analyze the Visual Sequential Search Test (VSST) using the episode matching method, showing different behaviors among patients with different pathologies under specific tasks.
Article
Computer Science, Information Systems
Wei Wang, Longbing Cao
Summary: Negative sequential patterns (NSPs) are more informative than classic positive sequential patterns (PSPs) due to the involvement of both occurring and nonoccurring behaviors and events. Loosening the negative element constraint (LNEC) can lead to more flexible pattern discovery but also introduces new learning challenges. VM-NSP and bM-NSP form the first vertically-based approach for complete NSP mining with LNEC, optimizing discovery performance.
ACM TRANSACTIONS ON INFORMATION SYSTEMS
(2021)
Article
Materials Science, Textiles
Pengfei Zhang, Zining Huang, Qiantong Zhou, Lei Wang, Ruru Pan, Yanna Fei, Weidong Gao
Summary: This paper presents a computer vision-based method to analyze sequential images of a deformed fabric and extract features to characterize its shape retention. The experiment showed that the proposed new indexes can effectively distinguish the shape retention of fabric samples after deformation.
TEXTILE RESEARCH JOURNAL
(2023)
Article
Computer Science, Artificial Intelligence
Clement Gautrais, Peggy Cellier, Thomas Guyet, Rene Quiniou, Alexandre Termier
Summary: This paper introduces the sky-signature model, an extension of the signature model, for multi-objective optimization. The model allows analysis of data at different levels of granularity and provides compact results.
DATA MINING AND KNOWLEDGE DISCOVERY
(2023)
Article
Computer Science, Information Systems
Yuki Noyori, Hironori Washizaki, Yoshiaki Fukazawa, Hideyuki Kanuka, Keishi Ooshima, Shuhei Nojiri, Ryosuke Tsuchiya
Summary: The study emphasizes the importance of comments in bug reports, indicating that mixed topics do not affect bug fixing time, but bug fixing time tends to be shorter when the discussion length of the phenomenon is short.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
(2021)
Article
Statistics & Probability
Christian H. Weiss, Boris Aleksandrov
Summary: This article introduces the method for extracting the Poisson distribution from bivariate count random variables and demonstrates how to compute moment expressions and joint moments using the Stein-Chen identity. Additionally, the extension to the multivariate case is explained.
AMERICAN STATISTICIAN
(2022)
Article
Statistics & Probability
Christian H. Weiss, Annika Homburg, Layth C. Alwan, Gabriel Frahm, Rainer Goeb
Summary: A computationally efficient resampling scheme is proposed to express uncertainty in coherent forecasts for count processes. The scheme is investigated through simulation study and demonstrated with a real-data example, showing that ensembles of forecast values can be visually presented for intuitive interpretation.
JOURNAL OF APPLIED STATISTICS
(2022)
Article
Statistics & Probability
Boris Aleksandrov, Christian H. Weiss, Carsten Jentsch
Summary: The study introduces a testing method for the null hypothesis of a Poisson marginal distribution based on the Stein-Chen identity, and derives the asymptotic distribution of various Stein-Chen statistics for a broad class of Poisson count time series. The performance of the tests is analyzed through simulations, along with a discussion on the choice of Stein-Chen functions for different alternative hypotheses.
STATISTICA NEERLANDICA
(2022)
Article
Physics, Multidisciplinary
Christian H. Weiss
Summary: The paper investigates the family of cumulative paired phi-entropies and their sample versions, deriving their asymptotic distributions for stationary ordinal time series data, and proposes a family of signed serial dependence measures related to Cohen's kappa. The practical relevance of these dispersion and dependence measures is explored through numerical computations and simulations, with an example application to ordinal time series data on air quality.
Article
Computer Science, Interdisciplinary Applications
Christian H. Weiss, Manuel Ruiz Marin, Karsten Keller, Mariano Matilla-Garcia
Summary: The new tests based on ordinal patterns are stable and robust, adaptable to monotone transformations of time series, and resistant to disturbances. These tests are applicable to linear and non-linear situations and can be used as misspecification tests under nuisance-free conditions.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2022)
Article
Mathematical & Computational Biology
Christian H. Weiss, Pedro Puig, Boris Aleksandrov
Summary: This study derives the asymptotics of the Poisson and binomial Stein-type GoF statistics for general count distributions and investigates their performance and application in medical data.
BIOMETRICAL JOURNAL
(2023)
Article
Mathematics, Applied
Christian H. Weiss
Summary: This study demonstrates that ordinal patterns can be used to construct hypothesis tests to detect possible serial dependence in time series, and the performance and power properties of these tests are examined through simulations. The application and interpretation of the tests are illustrated using an environmental data example.
Article
Computer Science, Interdisciplinary Applications
Kaizhi Yu, Huiqiao Wang, Christian H. Weiss
Summary: The paper proposes an empirical likelihood ratio (ELR) test for uncovering structural changes in integer-valued autoregressive (INAR) processes. The authors derive the limiting distribution under the null hypothesis of no parameter change at the anticipated change points. The finite-sample performance of the ELR test is evaluated through simulation studies, and its application to real data on infectious disease and crime counts is also demonstrated.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
(2023)
Article
Physics, Multidisciplinary
Christian H. Weiss, Boris Aleksandrov, Maxime Faymonville, Carsten Jentsch
Summary: In the context of time series, the partial autocorrelation function (PACF) is used for model identification, particularly in autoregressive (AR) models. The use of AR-type count processes, which have the same PACF characterization as AR models, has increased in recent decades. However, the conditions for asymptotic results are not met in AR-type count processes, leading to poor performance of the PACF test. Therefore, we propose different implementations of the PACF test for AR-type count processes using bootstrap schemes and compare them with asymptotic results in simulations.
Article
Statistics & Probability
Christian H. Weiss, Murat Caner Testik
Summary: This study addresses the problem of monitoring serial dependence in real-valued continuously distributed processes. A new control chart method based on ordinal patterns is proposed, which is nonparametric and distribution-free, and can be used almost instantly at the start of process monitoring.
Article
Engineering, Industrial
Sebastian Ottenstreuer, Christian H. Weiss, Murat Caner Testik
Summary: This study provides a survey of control charts for the sample-based monitoring of independent and identically distributed ordinal data, with critical comparisons of control statistics for different types of control charts. New results and proposals for process monitoring are also presented. Simulation study shows that demerit-type charts combined with EWMA smoothing generally outperform other charts that rely on sophisticated derivations. A real-world example of monitoring flashes in electric toothbrush manufacturing is discussed to illustrate the application and interpretation of the control charts in the study.
JOURNAL OF QUALITY TECHNOLOGY
(2023)
Article
Statistics & Probability
Malte Jahn, Christian H. Weiss, Hee-Young Kim
Summary: This study proposes a model called (B)INGARCH for modeling unbounded (bounded) counts, allowing for negative parameter and autocorrelation values. These models combine negative dependencies with long memory and can be easily adapted to special marginal features or cross-dependencies.
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS
(2023)
Article
Statistics & Probability
Christian H. Weiss, Alexander Schnurr
Summary: We introduce a novel testing procedure for identifying serial dependence in time series data. Our method is based on the ordinal structure of the data and considers ties in the data windows. To overcome the non-uniform pattern distribution under the null hypothesis, we use Cayley permutations and employ a bootstrap procedure. Simulation and real-world data examples demonstrate the effectiveness of our approach.
JOURNAL OF NONPARAMETRIC STATISTICS
(2023)
Article
Computer Science, Interdisciplinary Applications
Shaochen Wang, Christian H. Weiss
Summary: A Stein-type characterization of the Lindley distribution is derived, which extends some known recent results. Furthermore, a new characterization based on another independent exponential random variable is provided. Moment formulas related to the Lindley distribution are obtained, and generalized method-of-moments estimators for both the discrete and continuous Lindley distribution are proposed.
MATHEMATICS AND COMPUTERS IN SIMULATION
(2023)
Article
Statistics & Probability
Boris Aleksandrov, Christian H. Weiss, Simon Nik, Maxime Faymonville, Carsten Jentsch
Summary: This article proposes goodness-of-fit (GoF) tests based on statistics relying on certain moment properties to test the marginal distributions of unbounded counts, either following Poisson or negative binomial distributions. Unlike most existing approaches, the proposed tests consider a flexible class of functions of generalized moments and cover both higher-order factorial moments and Stein's identity. The performance of the tests is investigated through simulations and a data example is provided to illustrate their application.