Arabic Text Classification Using Dynamic N-Gram

Al Omoush, Safaa Qasim; Samawi, Venus W.

Arabic Text Classification Using Dynamic N-Gram

العنوان بلغة أخرى:	تصنيف النصوص العربية باستخدام الانغرام المتغير
المؤلف الرئيسي:	Al Omoush, Safaa Qasim (Author)
مؤلفين آخرين:	Samawi, Venus W. (Advisor)
محكمة:	نعم
التاريخ الميلادي:	2013
موقع:	المفرق
الصفحات:	1 - 52
رقم MD:	819023
نوع المحتوى:	رسائل جامعية
اللغة:	الإنجليزية
الدرجة العلمية:	رسالة ماجستير
الجامعة:	جامعة آل البيت
الكلية:	كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات
الدولة:	الاردن
قواعد المعلومات:	Dissertations
مواضيع:	تصنيف النصوص \| تصنيف النصوص العربية \| الانغرام المتغير \| علم الحاسب الآلى \| تكنولوجيا المعلومات
رابط المحتوى:	صفحة العنوان المستخلص قائمة المحتويات 24 صفحة الأولى 1 الفصل 2 الفصل 3 الفصل 4 الفصل الخاتمة المصادر والمراجع

عدد مرات التحميل

6

LEADER	03801nam a22003257a 4500
001	1469699
041		\|a eng
100		\|9 438843 \|a Al Omoush, Safaa Qasim \|e Author
245		\|a Arabic Text Classification Using Dynamic N-Gram
246		\|a تصنيف النصوص العربية باستخدام الانغرام المتغير
260		\|a المفرق \|c 2013
300		\|a 1 - 52
336		\|a رسائل جامعية
502		\|c جامعة آل البيت \|f كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات \|g الاردن \|o 0078 \|b رسالة ماجستير
520		\|a N-gram is defined as a subsequence of N items from a given sequence. In case of noisy text problem, N-gram is the ideal solution. Therefore, we are interested in using N-gram to represent text documents. In the literature, N-gram refers sometimes to sequences that are not ordered or consecutive. In this thesis, an N-gram will refer to a chain of N consecutive characters. Few researches used N as static value for Arabic text classification and information retrieval purposes. In static N-gram, the text will be segmented to create N-grams with the same length (value of N) such as 3, 4, 5...etc. The problem of this type of text representation is that, if there is a word or stem with letters less than N character, it will be neglected and considered as a useless word. For example, if N=4 then all the words which have fewer letters than 4 will be neglected. Our work is concerned with developing an automated system for classifying Arabic text documents by using N-gram as text representation. We have suggested dynamic N-gram, where N will be determined dynamically (based on word length) to reduce the common grams that may belong to totally different words. To study the performance of dynamic N-gram (weather it will improve the classification accuracy or not), both traditional static N-gram system and the suggested dynamic N-gram system have been built. The result of the two systems will be compared from accuracy, recall, precision, and F-measure point of views. F-measure is a standard statistical measure that is used to measure the performance of a classifier system. The F-measure is an average parameter based on precision and recall. Our proposed system consists of number of phases: document preprocessing, document feature extraction, construction of the classifier, and document classification. We have constructed two classifiers: Naïve Bayes (NB) classifier and Dice-measure distance classifier. Finally, in classification phase, we have evaluated the performance of our proposed system using Diab dataset, and calculated the standards evaluation measurements mentioned above. The classification results was promising (F-measure=98.87% with Dice-measure classifier). Also, it is found that the Dice-measure classifier performs better when dynamic N-gram is used.
653		\|a تصنيف النصوص \|a تصنيف النصوص العربية \|a الانغرام المتغير \|a علم الحاسب الآلى \|a تكنولوجيا المعلومات
700		\|9 46739 \|a Samawi, Venus W. \|e Advisor
856		\|u 9802-005-012-0078-T.pdf \|y صفحة العنوان
856		\|u 9802-005-012-0078-A.pdf \|y المستخلص
856		\|u 9802-005-012-0078-C.pdf \|y قائمة المحتويات
856		\|u 9802-005-012-0078-F.pdf \|y 24 صفحة الأولى
856		\|u 9802-005-012-0078-1.pdf \|y 1 الفصل
856		\|u 9802-005-012-0078-2.pdf \|y 2 الفصل
856		\|u 9802-005-012-0078-3.pdf \|y 3 الفصل
856		\|u 9802-005-012-0078-4.pdf \|y 4 الفصل
856		\|u 9802-005-012-0078-O.pdf \|y الخاتمة
856		\|u 9802-005-012-0078-R.pdf \|y المصادر والمراجع
930		\|d y \|p y
995		\|a Dissertations
999		\|c 819023 \|d 819023

عناصر مشابهة

Noun-Based Indexing for Arabic Text Retrieval Using Keyword and N-Gram Techniques
بواسطة: أبو مسامح، علاء يحيى منشور: (2022)
Automatic Arabic text classification
بواسطة: Al Khalilah, Mohanad Salamh منشور: (2011)
Automatic Arabic Text Categorization Using Efficient Classification Techniques
بواسطة: Al Awadi, Mouhammd Mahmoud منشور: (2015)
Investigation of associative classification techniques for text categorization
بواسطة: Al Mukhtar, Abd Allah Mohammed A. منشور: (2011)
Arabic Text Classification Using Learning Vector Quantization
بواسطة: Azarah, Mohammed N. منشور: (2012)

Arabic Text Classification Using Dynamic N-Gram

عدد مرات التحميل

6

عناصر مشابهة

دليل المستخدم

دليل الفيديو