An Efficient Approach for Medical Text Categorization Based on Clustering and Similarity Measures

Khaleel, Amal Hameed

An Efficient Approach for Medical Text Categorization Based on Clustering and Similarity Measures

العنوان بلغة أخرى:	طريقة فعالة لتصنيف النصوص الطبية بالاعتماد على العنقدة ومقاييس التشابه
المصدر:	مجلة ميسان للدراسات الأكاديمية
الناشر:	جامعة ميسان - كلية التربية الأساسية
المؤلف الرئيسي:	Khaleel, Amal Hameed (Author)
المجلد/العدد:	مج15, ع29
محكمة:	نعم
الدولة:	العراق
التاريخ الميلادي:	2016
الصفحات:	113 - 130
DOI:	10.54633/2333-015-029-015
ISSN:	1994-697X
رقم MD:	1185899
نوع المحتوى:	بحوث ومقالات
اللغة:	الإنجليزية
قواعد المعلومات:	HumanIndex, EduSearch
مواضيع:	التشخيص الطبي \| سجلات المرضى \| أمراض العيون \| الوثائق الطبية \| المعالجة الآلية
كلمات المؤلف المفتاحية:	Data Mining \| Text Mining \| Text Categorization "TC" \| Midline \| Euclidean Minimum Spanning Tree "EMST" \| Cosine Similarity \| Common Word Similarity
رابط المحتوى:	PDF (صورة)

المستخلص:

إن وجود كميات هائلة من المعلومات الطبية في المستندات الطبية، جعل استخدام أساليب التصنيف الآلي للنصوص ضروري في التشخيص والعلاج السريري. التصنيف الآلي للنص يستطيع أن يوفر معلومات حول توقع الصنف الذي ينتمي إليه النص. هذا البحث يمكن أن يكون بمثابة أداة تشخيص طبي لتصنيف سجلات المرضى وذلك باقتراح خوارزمية تصنيف النص بالاعتماد على تشابه المراكز العنقودية لتصنيف سجلات المرضى المصابين بأمراض العين. اقترحنا خوارزمية (VEMST) كتحديث لخوارزمية (EMST) وذلك باستخدام التباين لإيجاد المراكز العنقودية وتم تطوير خوارزمية تصنيف النص باستخدام مقياسي التشابه (جيب التمام، الكلمات المشتركة) لتصنيف البيانات المنعقدة. حيث أظهرت النتائج أنه عندما يكون عدد وحجم الوثائق الطبية المستخدمة للتدريب كبير فإن دقة التصنيف تزداد، كذلك لاحظنا عند استخدامنا طريقة مقارنة المصطلحات الطبية في مرحلة المعالجة الأولية، إن الدقة تكون أفضل من استخدام التكرار لكل الكلمات في النص الطبي بالإضافة إن وقت التنفيذ أقل. أخيراً، وجدنا أداء نظامنا عندما نستخدم مقياس التشابه جيب التمام هو أفضل من أدائه مع استخدام مقياس التشابه للكلمات المشتركة.

The huge amount of medical information available in the medical document, makes the use of automated text categorization methods essential in clinical diagnosis and treatment. Automatic categorization of a text can provide information about classes which a text belongs to. This paper can serve as a medical diagnosis tool for categorization patient records by propose text categorization algorithm based on the similarity cluster centers for the categorization of patients with eye diseases records. We propose VEMST algorithm as update to EMST algorithm by using variance to find cluster centers. A text categorization algorithm is developed using two similarity measures (cosine, common words) to classify the categorical data. The results showed that when the number and size of medical documents used great for training the classification accuracy increases, as we noticed when we use comparing medical terms method in the preprocessing phase, the accuracy is better than the use of frequency of all terms in medical document, as well as the execution time at least. Finally, we found the performance of our system when we use the cosine similarity measure is better than his performance with the use of the similarity of common words scale.

ISSN:

1994-697X

عناصر مشابهة

A Framework for Ranking and Categorizing Medical Documents
بواسطة: Al Zamil, Mohammed Gh. I. منشور: (2010)
The Development and Validation of a Novel Intravenous Medication Testing Station
بواسطة: Al Ghamdi, Anwar Ali منشور: (2012)
Medical-Symptom-Based Intelligent Diseases Classification and Recognition
بواسطة: المهاني، وسام عادل محمد منشور: (2019)
Classification of Arabic Texts Using Similarity Measures
بواسطة: عمر، هاني عبدالكريم محمد منشور: (2007)
A Study in the Effect of Applying the Electronic Medical Record in Improving the Performance of the File Departments and the Medical Services
بواسطة: الثميري، نوفل محمد أحمد منشور: (2008)

An Efficient Approach for Medical Text Categorization Based on Clustering and Similarity Measures

عناصر مشابهة

دليل المستخدم

دليل الفيديو