ارسل ملاحظاتك

ارسل ملاحظاتك لنا









Right-Truncated Index-Based Web Search Engine

المصدر: وقائع الندوة الدولية : المعالجة الآلية للغة العربية
الناشر: معهد الدراسات والأبحاث للتعريب
المؤلف الرئيسي: Al Gaphari, Ghaleb (Author)
مؤلفين آخرين: Graff, David (Co-Author)
محكمة: نعم
الدولة: المغرب
التاريخ الميلادي: 2007
مكان انعقاد المؤتمر: الرباط
الهيئة المسؤولة: معهد الدراسات والأبحاث للتعريب بالرباط - جامعة محمد الخامس
الشهر: يناير
الصفحات: 46 - 61
رقم MD: 594266
نوع المحتوى: بحوث المؤتمرات
اللغة: الإنجليزية
قواعد المعلومات: AraBase
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

20

حفظ في:
LEADER 03105nam a22002177a 4500
001 0008939
041 |a eng 
044 |b المغرب 
100 |9 11513  |a Al Gaphari, Ghaleb  |e Author 
245 |a Right-Truncated Index-Based Web Search Engine 
260 |b معهد الدراسات والأبحاث للتعريب  |c 2007  |g يناير 
300 |a 46 - 61 
336 |a بحوث المؤتمرات  |b Conference Proceedings 
520 |b With the present effort, we propose to investigate results of applying the Right- Truncated Index-Based Web Search Engine in order to determine its usefulness for storing and retrieving Arabic documents. The Right-Truncated Index-Based Web Search Engine, being a program for reading any set of Arabic documents, accepts a query, and then processes both the documents and the query. Thus, it selects (predicts) those documents most relevant to the query which was inserted. The program encompasses both a morphological component and a mathematical one. The morphological component allows the researcher to run either a stemming algorithm or a right-truncated algorithm. The chief advantage of the stemming algorithm is that it uses the least possible amount of storage for indexing by mapping the inflected and derived terms into a single, indexed stem-word. On the other hand, the right-truncated algorithm reduces the amount of storage to a lesser degree, but increases the probability of retrieving relevant (user-favorable) documents, compared to the stemming algorithm. One of the purposes of our investigation is to compare the efficiency of these two indexing mechanisms. The mathematical component of the algorithm accepts the output of the right truncation algorithm, and then employs both term-frequency and inverse document- frequency (TF-IDF) in order to establish the relative importance of each document, respective to the terms of the query. This component computes the TF-IDF (term-weighting scheme) by multiplying the inverse document frequency-array with the term frequency-array for each term contained in every document. Then, it computes the cosine-similarity shared by the query-vector and each individual document-vector in the collection. The greater the cosine-similarity between the query-vector and the document-vector, the greater the relevancy the document presents to the query. Expressed differently, the greater the cosine-similarity between the terms of the query and the document which contains those terms, the higher the probability that said document will correspond to user- interest, thereby improving the query's power to retrieve. 
653 |a اللغة العربية  |a محركات البحث  |a الإنترنت  |a البحث المقطعي 
700 |9 31718  |a Graff, David  |e Co-Author 
773 |c 029  |l 000  |o 6868  |s وقائع الندوة الدولية : المعالجة الآلية للغة العربية  |v 000  |d الرباط  |i معهد الدراسات والأبحاث للتعريب بالرباط - جامعة محمد الخامس 
856 |u 6868-000-000-029.pdf 
930 |d y  |p y 
995 |a AraBase 
999 |c 594266  |d 594266 

عناصر مشابهة