4.2 Article

Sequential Pattern Analysis: A Statistical Investigation of Sequence Length and Support

Journal

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/03610918.2011.654026

Keywords

Good distribution; Sequence length distribution; Sequential pattern analysis; Support; Text corpus

Ask authors/readers for more resources

In sequential pattern analysis, the frequency of patterns is evaluated by the support. While computed efficiently from large databases, we show that the support cannot be compared between different databases, since it is influenced by the actual sequence length distribution. Models for this sequence length distribution are surveyed. One of these models, the Good distribution, appears to be sufficiently flexible for practice. It is used to exemplify an approach for adjusting the relative support such that the resulting adjusted support values are better comparable between different databases. We illustrate our findings with texts from the bilingual FinDe corpus.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Statistics & Probability

Computing (Bivariate) Poisson Moments Using Stein-Chen Identities

Christian H. Weiss, Boris Aleksandrov

Summary: This article introduces the method for extracting the Poisson distribution from bivariate count random variables and demonstrates how to compute moment expressions and joint moments using the Stein-Chen identity. Additionally, the extension to the multivariate case is explained.

AMERICAN STATISTICIAN (2022)

Article Statistics & Probability

Efficient accounting for estimation uncertainty in coherent forecasting of count processes

Christian H. Weiss, Annika Homburg, Layth C. Alwan, Gabriel Frahm, Rainer Goeb

Summary: A computationally efficient resampling scheme is proposed to express uncertainty in coherent forecasts for count processes. The scheme is investigated through simulation study and demonstrated with a real-data example, showing that ensembles of forecast values can be visually presented for intuitive interpretation.

JOURNAL OF APPLIED STATISTICS (2022)

Article Statistics & Probability

Goodness-of-fit tests for Poisson count time series based on the Stein-Chen identity

Boris Aleksandrov, Christian H. Weiss, Carsten Jentsch

Summary: The study introduces a testing method for the null hypothesis of a Poisson marginal distribution based on the Stein-Chen identity, and derives the asymptotic distribution of various Stein-Chen statistics for a broad class of Poisson count time series. The performance of the tests is analyzed through simulations, along with a discussion on the choice of Stein-Chen functions for different alternative hypotheses.

STATISTICA NEERLANDICA (2022)

Article Physics, Multidisciplinary

Measuring Dispersion and Serial Dependence in Ordinal Time Series Based on the Cumulative Paired φ-Entropy

Christian H. Weiss

Summary: The paper investigates the family of cumulative paired phi-entropies and their sample versions, deriving their asymptotic distributions for stationary ordinal time series data, and proposes a family of signed serial dependence measures related to Cohen's kappa. The practical relevance of these dispersion and dependence measures is explored through numerical computations and simulations, with an example application to ordinal time series data on air quality.

ENTROPY (2022)

Article Computer Science, Interdisciplinary Applications

Non-parametric analysis of serial dependence in time series using ordinal patterns

Christian H. Weiss, Manuel Ruiz Marin, Karsten Keller, Mariano Matilla-Garcia

Summary: The new tests based on ordinal patterns are stable and robust, adaptable to monotone transformations of time series, and resistant to disturbances. These tests are applicable to linear and non-linear situations and can be used as misspecification tests under nuisance-free conditions.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2022)

Article Mathematical & Computational Biology

Optimal Stein-type goodness-of-fit tests for count data

Christian H. Weiss, Pedro Puig, Boris Aleksandrov

Summary: This study derives the asymptotics of the Poisson and binomial Stein-type GoF statistics for general count distributions and investigates their performance and application in medical data.

BIOMETRICAL JOURNAL (2023)

Article Mathematics, Applied

Non-parametric tests for serial dependence in time series based on asymptotic implementations of ordinal-pattern statistics

Christian H. Weiss

Summary: This study demonstrates that ordinal patterns can be used to construct hypothesis tests to detect possible serial dependence in time series, and the performance and power properties of these tests are examined through simulations. The application and interpretation of the tests are illustrated using an environmental data example.

CHAOS (2022)

Article Computer Science, Interdisciplinary Applications

An empirical-likelihood-based structural-change test for INAR processes

Kaizhi Yu, Huiqiao Wang, Christian H. Weiss

Summary: The paper proposes an empirical likelihood ratio (ELR) test for uncovering structural changes in integer-valued autoregressive (INAR) processes. The authors derive the limiting distribution under the null hypothesis of no parameter change at the anticipated change points. The finite-sample performance of the ELR test is evaluated through simulation studies, and its application to real data on infectious disease and crime counts is also demonstrated.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION (2023)

Article Physics, Multidisciplinary

Partial Autocorrelation Diagnostics for Count Time Series

Christian H. Weiss, Boris Aleksandrov, Maxime Faymonville, Carsten Jentsch

Summary: In the context of time series, the partial autocorrelation function (PACF) is used for model identification, particularly in autoregressive (AR) models. The use of AR-type count processes, which have the same PACF characterization as AR models, has increased in recent decades. However, the conditions for asymptotic results are not met in AR-type count processes, leading to poor performance of the PACF test. Therefore, we propose different implementations of the PACF test for AR-type count processes using bootstrap schemes and compare them with asymptotic results in simulations.

ENTROPY (2023)

Article Statistics & Probability

Nonparametric Control Charts for Monitoring Serial Dependence based on Ordinal Patterns

Christian H. Weiss, Murat Caner Testik

Summary: This study addresses the problem of monitoring serial dependence in real-valued continuously distributed processes. A new control chart method based on ordinal patterns is proposed, which is nonparametric and distribution-free, and can be used almost instantly at the start of process monitoring.

TECHNOMETRICS (2023)

Article Engineering, Industrial

A review and comparison of control charts for ordinal samples

Sebastian Ottenstreuer, Christian H. Weiss, Murat Caner Testik

Summary: This study provides a survey of control charts for the sample-based monitoring of independent and identically distributed ordinal data, with critical comparisons of control statistics for different types of control charts. New results and proposals for process monitoring are also presented. Simulation study shows that demerit-type charts combined with EWMA smoothing generally outperform other charts that rely on sophisticated derivations. A real-world example of monitoring flashes in electric toothbrush manufacturing is discussed to illustrate the application and interpretation of the control charts in the study.

JOURNAL OF QUALITY TECHNOLOGY (2023)

Article Statistics & Probability

Approximately linear INGARCH models for spatio-temporal counts

Malte Jahn, Christian H. Weiss, Hee-Young Kim

Summary: This study proposes a model called (B)INGARCH for modeling unbounded (bounded) counts, allowing for negative parameter and autocorrelation values. These models combine negative dependencies with long memory and can be easily adapted to special marginal features or cross-dependencies.

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS (2023)

Article Statistics & Probability

Generalized ordinal patterns in discrete-valued time series: nonparametric testing for serial dependence

Christian H. Weiss, Alexander Schnurr

Summary: We introduce a novel testing procedure for identifying serial dependence in time series data. Our method is based on the ordinal structure of the data and considers ties in the data windows. To overcome the non-uniform pattern distribution under the null hypothesis, we use Cayley permutations and employ a bootstrap procedure. Simulation and real-world data examples demonstrate the effectiveness of our approach.

JOURNAL OF NONPARAMETRIC STATISTICS (2023)

Article Computer Science, Interdisciplinary Applications

New characterizations of the (discrete) Lindley distribution and their applications

Shaochen Wang, Christian H. Weiss

Summary: A Stein-type characterization of the Lindley distribution is derived, which extends some known recent results. Furthermore, a new characterization based on another independent exponential random variable is provided. Moment formulas related to the Lindley distribution are obtained, and generalized method-of-moments estimators for both the discrete and continuous Lindley distribution are proposed.

MATHEMATICS AND COMPUTERS IN SIMULATION (2023)

Article Statistics & Probability

Modelling and diagnostic tests for Poisson and negative-binomial count time series

Boris Aleksandrov, Christian H. Weiss, Simon Nik, Maxime Faymonville, Carsten Jentsch

Summary: This article proposes goodness-of-fit (GoF) tests based on statistics relying on certain moment properties to test the marginal distributions of unbounded counts, either following Poisson or negative binomial distributions. Unlike most existing approaches, the proposed tests consider a flexible class of functions of generalized moments and cover both higher-order factorial moments and Stein's identity. The performance of the tests is investigated through simulations and a data example is provided to illustrate their application.

METRIKA (2023)

No Data Available