المستخلص: |
For the past thirty-years, the using of the electronic mails (E-mails) played an important role for online communication all around the world. In the meantime, e-mails messages are the main transaction method between millions of users. Therefore, such an important transaction method has attracted the hackers to attack the users by faking the emails called “Spam”, which may contain danger files. Feature selection algorithms studied according to the type of learning: supervised or unsupervised, this work is the supervised feature selection. Filtering these emails is a classification problem, which is difficult task and a hot research area. The classification performance of the machine learning models based on collected datasets still needs to be enhanced. The most popular dataset is SPAMBASE, which consists of 57 features, some of these features are not important and need to be removed. Selecting the important subset of features, enhances the classification performance and reduces the required time for training process. The purpose of this thesis is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. The proposed algorithm is a hybrid filter and wrapper feature selection algorithm which has the ability to select the most relevant features. The first part of the proposed algorithm is the information gain method, which represents the filter feature selection part. While the wrapper method is represented by Black Hole (BH) algorithm. BH algorithm is a recently developed algorithm for solving optimization problems, which mimics the universal phenomenon of black holes. The proposed algorithm handles the features in a binary form where 1 represents the selected features, while 0 represents the unselected features. The fitness for each star or solution was evaluated using naïve bayesian classifier (nbc), which indicates the black hole (i.e., best solution). The algorithm has been experimented by using different scenarios, the binary hybrid filer-wrapper algorithm (BBH) enhances the accuracy of the email spam filtering system.
|