4.8 Article

Incremental Evolution of Fuzzy Grammar Fragments to Enhance Instance Matching and Text Mining

Journal

IEEE TRANSACTIONS ON FUZZY SYSTEMS
Volume 16, Issue 6, Pages 1425-1438

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TFUZZ.2008.925920

Keywords

Entity extraction; evolving system; extensible markup language (XML); fuzzy sets; grammar fragments; incremental learning; instance matching; tagging; text mining

Funding

  1. British Telecommunications (BT)
  2. U.K. Defense Technology Centre for Data and Information Fusion
  3. EPSRC [EP/E058388/1] Funding Source: UKRI
  4. Engineering and Physical Sciences Research Council [EP/E058388/1] Funding Source: researchfish

Ask authors/readers for more resources

In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar. fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method's validity.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available