Linguistic Resources and Analysis for Unvowelled Arabic Text Processing in InformationRetrieval

Semmar, Nasr Aldine; Fluhr, Christian; Gara, Faiza Al Khateb

Linguistic Resources and Analysis for Unvowelled Arabic Text Processing in InformationRetrieval

العنوان بلغة أخرى:	دور الموارد اللسانية وتحليلها في معالجة النص العربي غير المشكول وتوظيفها في استرجاع المعلومات
المصدر:	مجلة التواصل اللساني
الناشر:	مؤسسة العرفان للإستشارات التربوية والتطوير المهني
المؤلف الرئيسي:	Semmar, Nasr Aldine (Author)
مؤلفين آخرين:	Fluhr, Christian (Advisor) , Gara, Faiza Al Khateb (Advisor)
المجلد/العدد:	مج15, ع1,2
محكمة:	نعم
الدولة:	المغرب
التاريخ الميلادي:	2013
الصفحات:	101 - 112
ISSN:	0851-6774
رقم MD:	596901
نوع المحتوى:	بحوث ومقالات
اللغة:	الإنجليزية
قواعد المعلومات:	AraBase
مواضيع:	الدراسات اللسانية \| اللغة العربية \| النصوص المكتوبة
رابط المحتوى:	PDF (صورة) PDF (نص) HTML

عدد مرات التحميل

20

المستخلص:

The purpose of information retrieval (IR) is to find all documents relevant for a user’s query in a collection of documents. The central task in Natural Language Processing (NLP) for JR is the transformation of potentially ambiguous natural language queries and documents into unambiguous internal representations on which matching and retrieval can take place. Many levels of NLP can be used for this purpose: morphological, lexical, syntactic and semantic analysis. The LIC2M cross-language information retrieval system is a weighted Boolean search engine over syntactic structures produced by a linguistic analysis of the query and the documents. The system is composed of a linguistic analyzer, a statistic analyzer, a reformulator, an indexer, a comparator and a search engine. This system is designed to work on Arabic, Chinese, English, French, German and Spanish. Arabic is highly productive, both derivationally and inflectionally. Definite articles, conjunctions, particles and other prefixes can attach to the beginning of a word, and large numbers of suffixes can attach to the end. Moreover, newspaper Arabic texts are often completely or partially vowelled and an unvowelled word can correspond to a set of potentially vowelled words having different meanings. For information retrieval, this abundance of forms, lexical variability, and orthographic alternatives, all result in a greater likelihood of mismatch between the form of a word in a query and the forms found in documents relevant to the query. To improve retrieval effectiveness of any Arabic information retrieval system, specific processing for vowellation and stemming is required. In this paper we present an Arabic linguistic analyzer used in a cross- lingual information retrieval application. We will particularly focus on morphological module and linguistic resources used in the different analysis levels.

ISSN:

0851-6774

عناصر مشابهة

Text Analysis and Automatic Indexing for Arabic Based Automated Information Retrieval System
بواسطة: Al Naim, Faisal Mohammad منشور: (1989)
Conditional Light Stemming for Enhanced Arabic Information Retrieval
بواسطة: مطارنة، خولة منشور: (2017)
Arabic Voice-Based Information Retrieval
بواسطة: Al Said, Ghadeer منشور: (2007)
Experiments in Improvement of Arabic Information Retrieval
بواسطة: Harrag, Fouzi منشور: (2009)
Exploit Genetic Algorithm to Enhance Arabic Information Retrieval
بواسطة: Al Shargabi, Bassam منشور: (2009)

Linguistic Resources and Analysis for Unvowelled Arabic Text Processing in InformationRetrieval

عدد مرات التحميل

20

عناصر مشابهة

دليل المستخدم

دليل الفيديو