LEADER |
02857nam a22002417a 4500 |
001 |
0013059 |
041 |
|
|
|a eng
|
044 |
|
|
|b المغرب
|
100 |
|
|
|9 50321
|a T, Rachid
|e Author
|
245 |
|
|
|a The Effect of Full and Partial Diacritization on Arabic Root Extraction
|
260 |
|
|
|b معهد الدراسات والأبحاث للتعريب
|c 2007
|g يونيو
|
300 |
|
|
|a 189 - 200
|
336 |
|
|
|a بحوث المؤتمرات
|b Article
|
520 |
|
|
|b This paper presents a novel approach for extracting roots of vocalized Arabic words. The developed Vocalized Arabic Word Root Extraction (VAWRE) algorithm is a continuation of previous research conducted at the Arabic Computing research laboratory at Al Akhawayn University for the development of an Arabic root extractor [I], which has been integrated onto Barq search engine [2J. The approach takes into account both the non-concatenative morphology and the complex orthography of the Arabic language. The VAWRE algorithm uses a manually constructed dictionary of 8,950 Arabic roots and a maintained list of vocalized morphological templates organized into 45 sets [3]. The constructed root dictionary along with the list of vocalized morphological template sets covers all most frequent words that appear in Arabic modern text. The algorithm extracts the most precise root (or the set of all possible roots in case of ambiguity) rather than stems. The approach makes use of diacritic marks, which are used in the Arabic language mainly as short vowels, for the purpose of reducing the identified root ambiguities and hence, enhancing the root extraction precision. Moreover, it provides enough flexibility to handle fully vocalized, partially vocalized and non-vocalized words, so as to cope with the recognizable lack of a standardized punctuation model in modern Arabic texts. The implemented approach has been tested on evaluation corpora, which consist of 258 Arabic text documents collected from the Web. The obtained results have shown that the VAWRE algorithm achieved an overall performance of 85% and an average root extraction correctness of 77%. Moreover, the results have proven that the use of vocalization in root extraction achieves an average root ambiguity reduction of 33%.
|
653 |
|
|
|a المؤتمرات و الندوات
|a مستخلصات الأبحاث
|a اللغة العربية
|a النحو والصرف
|
700 |
|
|
|9 25190
|a Chekayri, Abd Allah
|e Co-Author
|
700 |
|
|
|9 43184
|a Chhoul, O.
|e Co-Author
|
700 |
|
|
|9 40594
|a Mahamdi, M.
|e Co-Author
|
773 |
|
|
|c 007
|d الرباط
|i منشورات معهد الدراسات والأبحاث للتعريب جامعة محمد الخامس
|l 000
|o 6904
|s الندوة الدولية: المعالجة الآلية للغة العربية CITALA'07
|v 000
|
856 |
|
|
|u 6904-000-000-007.pdf
|
930 |
|
|
|d y
|p y
|
995 |
|
|
|a AraBase
|
999 |
|
|
|c 600197
|d 600197
|