4.2 Article

Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASL.2012.2197612

关键词

A* lattice search; Arabic language processing; space deletion errors; space insertion errors; spelling correction; statistical disambiguation; word distance

向作者/读者索取更多资源

This paper presents a stochastic-based approach for misspelling correction of Arabic text. In this approach, a context-based two-layer system is utilized to automatically correct misspelled words in large datasets. The first layer produces a list in which possible alternatives for each misspelled word are ranked using the Damerau-Levenshtein edit distance. The same layer also considers merged and split words resulting from deletion and insertion of space character. The right alternative for each misspelled word is stochastically selected based on the maximum marginal probability via A* lattice search and m-gram probability estimation. A large dataset was utilized to build and test the system. The testing results show that as we increase the size of the training set, the performance improves reaching 97.9% of F-1 score for detection and 92.3% of F-1 score for correction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据