ارسل ملاحظاتك

ارسل ملاحظاتك لنا







يجب تسجيل الدخول أولا

Arabic Keyword Extraction Using Artificial Neural Networks

العنوان بلغة أخرى: استخراج الكلمات المفتاحية من النص العربى بإستخدام الشبكات العصبية الاصطناعية
المؤلف الرئيسي: Omoush, Ebtehal H. (Author)
مؤلفين آخرين: Al Haj, Ali M. (Advisor) , Samawi, Venus W. (Advisor)
التاريخ الميلادي: 2012
موقع: المفرق
الصفحات: 1 - 78
رقم MD: 819139
نوع المحتوى: رسائل جامعية
اللغة: الإنجليزية
الدرجة العلمية: رسالة ماجستير
الجامعة: جامعة آل البيت
الكلية: كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات
الدولة: الاردن
قواعد المعلومات: Dissertations
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

82

حفظ في:
المستخلص: The main objective of this work concerns with keyword extraction. The proposed work presents a technique to extract keywords from Arabic single text document using statistical features. Kohonen Artificial Neural Networks (ANN) approach is used to cluster keywords. The proposed model consists of three main stages: Document Preprocessing stage: five linguistic operations are implemented, these are: Removing non Arabic letters, Lexical analysis of the text (eliminating punctuation marks, digits, and the special symbols), remove stop-words, Perform light stemming, and excluding words that have length less than three letters. The second stage Generates statistical features vector for each word. The proposed system based on the analyses of some term occurrence characteristics such as the Term Frequency (TF), if the word in the First Sentence (FS) in the text, if the word in the Last Sentence (LS) of the text, if the word appears in the document Title (T), and the spread of that word over the document according to measure of Sentence Frequency (SF). In this work, we also studied the effect of using Normalized Term Frequency (NTF) and Ratio of Sentence Frequency (RSF) on the clustering accuracy and the absent and present of each feature on the result of our proposed system to specify the best feature set. Finally, construct SOM (Khonen neural network) to cluster keywords, where the number of nodes in the input layer will depend on number of features in feature vector, the output node(s) in the output layer will be two nodes (keyword, or non-keyword). The winner node (keyword) that have highest weight. The proposed model performance is evaluated using recall, precision, and F-measure. The adopted Khonen neural network is applied on 48 documents (24 documents selected from Jordan Journal of Social Sciences (JJSS), and 24 documents selected from the Arabic Wikipedia dataset). The result of each experiment is then compared with the actual keywords associated with each document (for Wikipedia dataset, meta-tag is considered as keyword; for JJSS dataset, keywords are associated with each document). The system performance has been compared with Sakhr keyword extractor. By comparing the performance of the suggested system with Sakhr system, in general, the proposed system showed comparable performance. To specify the best feature set, 12 different combinations of statistical features are considered. As a result of experiments, the best average of recalls was for feature set < T, TF, SF, FS and LS > where it was 52.63%. The best average of precisions was when < T, TF, FS and LS > feature set is used, where on average the precision= 42.84%. Finally, the best F-measure on average is achieved when <TF> alone is used.

عناصر مشابهة