المستخلص: |
This paper describes a technique for spelling and correcting Arabic text that provides different variables that can be controlled to give customized results based on the properties of the processed text. The proposed technique depends on dynamic dictionaries controlled and customized based on the input text categorization. In the research reported here we employ a statistical/corpus-based approach with data obtained from the Arabic Wikipedia and local Palestinian newspapers. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for our spelling and text correction technique. Our spelling technique builds on earlier work[7], but using new spelling variables and dynamic dictionaries based on categorized texts. We briefly report on the results of preliminary testing and analysis. While the results reported here are promising, they must be viewed as work in progress, still in need of more testing, refining, integration and deployment in real life settings.
|