ارسل ملاحظاتك

ارسل ملاحظاتك لنا







Arabic Search Results Disambiguation: A Supervised Approach to Unsupervised Learning

المؤلف الرئيسي: Salhi, Haytham (Author)
مؤلفين آخرين: Jarrar, Radi (Advisor), Yahya, Adnan H. (Advisor)
التاريخ الميلادي: 2019
موقع: بيرزيت
الصفحات: 1 - 139
رقم MD: 1015623
نوع المحتوى: رسائل جامعية
اللغة: الإنجليزية
الدرجة العلمية: رسالة ماجستير
الجامعة: جامعة بيرزيت
الكلية: كلية الهندسة والتكنولوجيا
الدولة: فلسطين
قواعد المعلومات: Dissertations
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

30

حفظ في:
المستخلص: Web search engines aim at retrieving relevant results as a response to a given query, or more precisely an information need. However, the query can be ambiguous, which means it might refer to different meanings or senses. Search results clustering (SRC) is a powerful approach that dynamically attempts to find groups of sense-relevant results. The preprocessing stage of SRC highly affects the effectiveness, and though there is a lot of research on SRC, the research has not yet clearly shown the best source from which features could be selected nor the best representation by which features could be represented. Moreover, a little amount of research, with the lack of Arabic datasets, has been paid to Arabic. The major contributions of this thesis are fourfold: 1) It examines the influence of feature source (i.e., title, snippet, etc.) and feature representation on the effectiveness of SRC, figuring out the best combination that results in a high-quality clustering of Arabic Web search results. 2) It introduces a set of benchmarks for Arabic, called AMBIGArabic, and a new framework, called Spread, for data labeling, search results acquisition, and performing SRC experiments. 3) It shows how useful the blind relevance feedback concept is in SRC. 4) Lastly, it proposes a new SRC approach, called SAUL, along with an implementation of this approach based on Wikipedia as a source of the senses. The results show that feature sources and feature representations significantly affect the effectiveness of SRC, and combinations like (title with snippet, single words) and (title with snippet, single words with 2 gram and 3-gram words) are amongst the best. Also, by comparing the best combinations, the proposed approach outperforms the baseline approach.

عناصر مشابهة