ارسل ملاحظاتك

ارسل ملاحظاتك لنا







An Enhancement over Birch Hierarchical Clustering Algorithms for Better Partitioning of Medical Data

المؤلف الرئيسي: النسور، رعد محمد جميل (مؤلف)
مؤلفين آخرين: Al Shrouf, Faiz (Advisor)
التاريخ الميلادي: 2020
موقع: عمان
الصفحات: 1 - 52
رقم MD: 1103092
نوع المحتوى: رسائل جامعية
اللغة: الإنجليزية
الدرجة العلمية: رسالة ماجستير
الجامعة: جامعة الاسراء الخاصة
الكلية: كلية تكنولوجيا المعلومات
الدولة: الاردن
قواعد المعلومات: Dissertations
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

18

حفظ في:
المستخلص: Over the years, technology has revolutionized our world and daily lives, information is getting to be more accessible and shared to the public users, big data across the web are being collected and saved in all forms from texts to different media files, machine learning algorithms are utilizing these data to learn more about it which in response, could improve these algorithms to be more useful and applicable in the real world, Clustering algorithms are unsupervised machine learning algorithms that can be used in many fields including pattern recognition and image analysis, There are many clustering algorithms such as K means and Agglomerative Hierarchical Clustering (AHC), however they work fine in specific data sets. Clustering algorithms can be used to cluster medical data to find an undiscovered pattern which in result improves the medical field’s knowledge about patients and different diseases, This thesis will focus on one of the most dangerous diseases cancer, SEER databases provides a big amount of data from the year of 1973 until now about cancer patients from various locations and sources throughout the United States, to find useful patterns through these data a good clustering algorithm is needed to cluster such big data, BIRCH is one of the most effective clustering algorithms on big data. This thesis investigates the development of new technologies to propose the MD BIRCH algorithm which is an enhanced version of BIRCH algorithm by implementing Manhattan distance over multiple phases of BIRCH algorithm from early stages of compacting data points into an initial Clustering Feature (CF) tree to the middle stages while descending the tree into more depth to the late stages of removing the outliers and performing global clustering on the whole tree by another modified clustering algorithm based on Manhattan distance. The experiments have been conducted on SEER medical dataset over multiple clustering iterations, where each BIRCH and MD-BIRCH has been executed 8 times over cancer patients big data sample, the results showed that the MD-BIRCH algorithm has outperformed BIRCH algorithm in terms of quality and has a slightly an enhanced performance. This work has been implemented by Python 3.7 programming language.

عناصر مشابهة