ارسل ملاحظاتك

ارسل ملاحظاتك لنا







Morphological, Syntactic And Diacritics Rules For Automatic Diacritization Of Arabic Sentences

المصدر: مجلة جامعة الملك سعود - علوم الحاسب والمعلومات
الناشر: جامعة الملك سعود
المؤلف الرئيسي: Chennoufi, Amine (Author)
مؤلفين آخرين: Mazroui, Azzeddine (Co-Author)
المجلد/العدد: مج29, ع2
محكمة: نعم
الدولة: السعودية
التاريخ الميلادي: 2017
الصفحات: 156 - 163
DOI: 10.33948/0584-029-002-003
ISSN: 1319-1578
رقم MD: 974085
نوع المحتوى: بحوث ومقالات
اللغة: الإنجليزية
قواعد المعلومات: science
مواضيع:
كلمات المؤلف المفتاحية:
Arabic Language | Automatic Diacritization | Arabic Diacritical Marks | Morphological Analysis | Smoothing Techniques | Hidden Markov Model
رابط المحتوى:
صورة الغلاف QR قانون
حفظ في:
LEADER 02311nam a22002417a 4500
001 1716919
024 |3 10.33948/0584-029-002-003 
041 |a eng 
044 |b السعودية 
100 |9 525330  |a Chennoufi, Amine  |e Author 
245 |a Morphological, Syntactic And Diacritics Rules For Automatic Diacritization Of Arabic Sentences 
260 |b جامعة الملك سعود  |c 2017 
300 |a 156 - 163 
336 |a بحوث ومقالات  |b Article 
520 |b  The diacritical marks of Arabic language are characters other than letters and are in the majority of cases absent from Arab writings. This paper presents a hybrid system for automatic diacritization of Arabic sentences combining linguistic rules and statistical treatments. The used approach is based on four stages. The first phase consists of a morphological analysis using the second version of the morphological analyzer Alkhalil Morpho Sys. Morphosyntactic outputs from this step are used in the second phase to eliminate invalid word transitions according to the syntactic rules. Then, the system used in the third stage is a discrete hidden Markov model and Viterbi algorithm to determine the most probable diacritized sentence. The unseen transitions in the training corpus are processed using smoothing techniques. Finally, the last step deals with words not analyzed by Alkhalil analyzer, for which we use statistical treatments based on the letters. The word error rate of our system is around 2.58% if we ignore the diacritic of the last letter of the word and around 6.28% awhen this diacritic is taken into account. 
653 |a اللغة العربية  |a علم الصرف  |a التحليل اللغوي  |a اللسانيات الحاسوبية 
692 |b Arabic Language  |b Automatic Diacritization  |b Arabic Diacritical Marks  |b Morphological Analysis  |b Smoothing Techniques  |b Hidden Markov Model 
700 |9 525321  |a Mazroui, Azzeddine  |e Co-Author 
773 |c 003  |e Journal of King Saud University (Computer and Information Sciences)  |f Maǧalaẗ ǧamʼaẗ al-malīk Saud : ùlm al-ḥasib wa al-maʼlumat  |l 002  |m مج29, ع2  |o 0584  |s مجلة جامعة الملك سعود - علوم الحاسب والمعلومات  |v 029  |x 1319-1578 
856 |u 0584-029-002-003.pdf 
930 |d y  |p y 
995 |a science 
999 |c 974085  |d 974085 

عناصر مشابهة