ارسل ملاحظاتك

ارسل ملاحظاتك لنا







The Effect of Full and Partial Diacritization on Arabic Root Extraction

المصدر: الندوة الدولية: المعالجة الآلية للغة العربية CITALA'07
الناشر: معهد الدراسات والأبحاث للتعريب
المؤلف الرئيسي: T, Rachid (Author)
مؤلفين آخرين: Chekayri, Abd Allah (Co-Author) , Chhoul, O. (Co-Author) , Mahamdi, M. (Co-Author)
محكمة: نعم
الدولة: المغرب
التاريخ الميلادي: 2007
مكان انعقاد المؤتمر: الرباط
الهيئة المسؤولة: منشورات معهد الدراسات والأبحاث للتعريب جامعة محمد الخامس
الشهر: يونيو
الصفحات: 189 - 200
رقم MD: 600197
نوع المحتوى: بحوث المؤتمرات
اللغة: الإنجليزية
قواعد المعلومات: AraBase
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

3

حفظ في:
المستخلص: This paper presents a novel approach for extracting roots of vocalized Arabic words. The developed Vocalized Arabic Word Root Extraction (VAWRE) algorithm is a continuation of previous research conducted at the Arabic Computing research laboratory at Al Akhawayn University for the development of an Arabic root extractor [I], which has been integrated onto Barq search engine [2J. The approach takes into account both the non-concatenative morphology and the complex orthography of the Arabic language. The VAWRE algorithm uses a manually constructed dictionary of 8,950 Arabic roots and a maintained list of vocalized morphological templates organized into 45 sets [3]. The constructed root dictionary along with the list of vocalized morphological template sets covers all most frequent words that appear in Arabic modern text. The algorithm extracts the most precise root (or the set of all possible roots in case of ambiguity) rather than stems. The approach makes use of diacritic marks, which are used in the Arabic language mainly as short vowels, for the purpose of reducing the identified root ambiguities and hence, enhancing the root extraction precision. Moreover, it provides enough flexibility to handle fully vocalized, partially vocalized and non-vocalized words, so as to cope with the recognizable lack of a standardized punctuation model in modern Arabic texts. The implemented approach has been tested on evaluation corpora, which consist of 258 Arabic text documents collected from the Web. The obtained results have shown that the VAWRE algorithm achieved an overall performance of 85% and an average root extraction correctness of 77%. Moreover, the results have proven that the use of vocalization in root extraction achieves an average root ambiguity reduction of 33%.

عناصر مشابهة