Journal
IEEE TRANSACTIONS ON FUZZY SYSTEMS
Volume 16, Issue 6, Pages 1425-1438Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TFUZZ.2008.925920
Keywords
Entity extraction; evolving system; extensible markup language (XML); fuzzy sets; grammar fragments; incremental learning; instance matching; tagging; text mining
Funding
- British Telecommunications (BT)
- U.K. Defense Technology Centre for Data and Information Fusion
- EPSRC [EP/E058388/1] Funding Source: UKRI
- Engineering and Physical Sciences Research Council [EP/E058388/1] Funding Source: researchfish
Ask authors/readers for more resources
In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar. fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method's validity.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available