ارسل ملاحظاتك

ارسل ملاحظاتك لنا







Challenges in Building Corpora for Algerian Arabic from CMC Content

المصدر: مجلة الحقيقة للعلوم الاجتماعية والإنسانية
الناشر: جامعة أحمد دراية أدرار
المؤلف الرئيسي: Omari, Mohammed (Author)
مؤلفين آخرين: Bouhania, Bachir (Co-Author)
المجلد/العدد: مج21, ع4
محكمة: نعم
الدولة: الجزائر
التاريخ الميلادي: 2022
التاريخ الهجري: 1444
الشهر: ديسمبر
الصفحات: 594 - 617
ISSN: 1112-4210
رقم MD: 1348986
نوع المحتوى: بحوث ومقالات
اللغة: الإنجليزية
قواعد المعلومات: AraBase, HumanIndex
مواضيع:
كلمات المؤلف المفتاحية:
Algerian Arabic | Computer-Mediated Communication | Corpus Linguistics | Facebook | Wattpad
رابط المحتوى:
صورة الغلاف QR قانون
حفظ في:
المستخلص: Algerian Arabic is an under-resourced Arabic dialect. Few corpora and natural language processing tools were developed for it. This is due to a variety of factors such as its lack of written content and of a standard orthography as well as the frequent code-switching and script switching exhibited by its speakers. These factors render developing homogenous corpora for the dialect more challenging compared to other Arabic dialects where such factors are less pronounced. The objective of this work is to examine the challenges and issues encountered in developing a corpus of Algerian Arabic extracted from computer-mediated communication content, primarily content on the social media platform Facebook and the story-publishing website Wattpad.

ISSN: 1112-4210

عناصر مشابهة