☆ 4.6 Article

Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package

JOURNAL OF STATISTICAL SOFTWARE (2016)

期刊

JOURNAL OF STATISTICAL SOFTWARE

卷 72, 期 3, 页码 1-39

出版社

JOURNAL STATISTICAL SOFTWARE

DOI: 10.18637/jss.v072.i03

关键词

state sequences; categorical sequences; sequence visualization; sequence data mining; variable-length Markov chains; probabilistic suffix trees; R

类别

Computer Science, Interdisciplinary Applications Statistics & Probability

资金

Swiss National Centre of Competence in Research LIVES - Overcoming vulnerability
Swiss National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This article presents the PST R package for categorical sequence analysis with probabilistic suffix trees (PSTs), i.e., structures that store variable-length Markov chains (VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with parsimonious models based on simple estimation procedures. The package is specifically adapted to the field of social sciences, as it allows for VLMC models to be learned from sets of individual sequences possibly containing missing values; in addition, the package is extended to account for case weights. This article describes how a VLMC model is learned from one or more categorical sequences and stored in a PST. The PST can then be used for sequence prediction, i.e., to assign a probability to whole observed or artificial sequences. This feature supports data mining applications such as the extraction of typical patterns and outliers. This article also introduces original visualization tools for both the model and the outcomes of sequence prediction. Other features such as functions for pattern mining and artificial sequence generation are described as well. The PST package also allows for the computation of probabilistic divergence between two models and the fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning sample are stored in a single PST.

Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package

期刊

JOURNAL OF STATISTICAL SOFTWARE

出版社

JOURNAL STATISTICAL SOFTWARE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package

期刊

JOURNAL OF STATISTICAL SOFTWARE

出版社

JOURNAL STATISTICAL SOFTWARE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文