The Effect of Full and Partial Diacritization on Arabic Root Extraction

T, Rachid; Chekayri, Abd Allah; Chhoul, O.; Mahamdi, M.

The Effect of Full and Partial Diacritization on Arabic Root Extraction

المصدر:	الندوة الدولية: المعالجة الآلية للغة العربية CITALA'07
الناشر:	معهد الدراسات والأبحاث للتعريب
المؤلف الرئيسي:	T, Rachid (Author)
مؤلفين آخرين:	Chekayri, Abd Allah (Co-Author) , Chhoul, O. (Co-Author) , Mahamdi, M. (Co-Author)
محكمة:	نعم
الدولة:	المغرب
التاريخ الميلادي:	2007
مكان انعقاد المؤتمر:	الرباط
الهيئة المسؤولة:	منشورات معهد الدراسات والأبحاث للتعريب جامعة محمد الخامس
الشهر:	يونيو
الصفحات:	189 - 200
رقم MD:	600197
نوع المحتوى:	بحوث المؤتمرات
اللغة:	الإنجليزية
قواعد المعلومات:	AraBase
مواضيع:	المؤتمرات و الندوات \| مستخلصات الأبحاث \| اللغة العربية \| النحو والصرف
رابط المحتوى:	PDF (صورة)

عدد مرات التحميل

3

المستخلص:

This paper presents a novel approach for extracting roots of vocalized Arabic words. The developed Vocalized Arabic Word Root Extraction (VAWRE) algorithm is a continuation of previous research conducted at the Arabic Computing research laboratory at Al Akhawayn University for the development of an Arabic root extractor [I], which has been integrated onto Barq search engine [2J. The approach takes into account both the non-concatenative morphology and the complex orthography of the Arabic language. The VAWRE algorithm uses a manually constructed dictionary of 8,950 Arabic roots and a maintained list of vocalized morphological templates organized into 45 sets [3]. The constructed root dictionary along with the list of vocalized morphological template sets covers all most frequent words that appear in Arabic modern text. The algorithm extracts the most precise root (or the set of all possible roots in case of ambiguity) rather than stems. The approach makes use of diacritic marks, which are used in the Arabic language mainly as short vowels, for the purpose of reducing the identified root ambiguities and hence, enhancing the root extraction precision. Moreover, it provides enough flexibility to handle fully vocalized, partially vocalized and non-vocalized words, so as to cope with the recognizable lack of a standardized punctuation model in modern Arabic texts. The implemented approach has been tested on evaluation corpora, which consist of 258 Arabic text documents collected from the Web. The obtained results have shown that the VAWRE algorithm achieved an overall performance of 85% and an average root extraction correctness of 77%. Moreover, the results have proven that the use of vocalization in root extraction achieves an average root ambiguity reduction of 33%.

عناصر مشابهة

Towards Measure for Arabic Corpora Quality
بواسطة: Benajibe, Yassine منشور: (2007)
Design and computer multilingualism : case of diacritical marks
بواسطة: Lazrek, Azz Aldine منشور: (2009)
Morphological, Syntactic And Diacritics Rules For Automatic Diacritization Of Arabic Sentences
بواسطة: Chennoufi, Amine منشور: (2017)
Extraction des formes derivees des mots arabes par des automates deterministes
بواسطة: Jait, Jamal منشور: (2009)
Analyseur Morphologique Pour L’arabe
بواسطة: Belgacem, Mohamed منشور: (2007)

The Effect of Full and Partial Diacritization on Arabic Root Extraction

عدد مرات التحميل

3

عناصر مشابهة

دليل المستخدم

دليل الفيديو