Journal
INFORMATION SCIENCES
Volume 507, Issue -, Pages 715-732Publisher
ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2018.04.013
Keywords
Pattern discovery; Sequence; Tri-partition; Tri-pattern
Categories
Funding
- National Science Foundation of China [61379089, 41604114]
- Open Research Fund of Sichuan Key Laboratory for Nature Gas and Geology [2015trqdz04]
Ask authors/readers for more resources
The concept of patterns is the basis of sequence analysis. There are various pattern definitions for biological data, texts, and time series. Inspired by the methodology of three-way decisions and protein tri-partition, this paper proposes a frequent pattern discovery algorithm for a new type of pattern by dividing the alphabet into strong, medium, and weak parts. The new type, called a tri-pattern, is more general and flexible than existing ones and is therefore more interesting in applications. Experiments were undertaken on data in various fields to reveal the universality of this new pattern. These include protein sequence mining, petroleum production time series analysis, and forged Chinese text keyword mining. The results show that tri-patterns are more meaningful and desirable than the existing four types of patterns. This study enriches the semantics of sequential pattern discovery and the application fields of three-way decisions. (C) 2018 Elsevier Inc. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available