Presenting A Hybrid Feature Selection Method Using Ig And Svm Wrapper For E-mail Spam Filtering


Seyed Mostafa Pourhashemi - Department of Computer, Dezful Branch , Islamic Azad university, Dezful, Iran.
Alireza Osareh - Department of Computer, Shahid Chamran University, Ahvaz, Iran.
Bita Shadgar - Department of Computer, Shahid Chamran University, Ahvaz, Iran.


The growing volume of spam emails has resulted in the necessity for more accurate and efficient email classification system. The purpose of this research is presenting an machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Information Gain (IG) filter and Support Vector Machine (SVM) wrapper as feature selectors. In addition, MNB classifier, DMNB classifier, SVM classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%.



[1] I. Androutsopoulos , J. Koutsias , K.V. Chandrinos , C.D. Spyropoulos, An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, (2000) 160-167.
[2] Spam Abuse Corporation, <>, Visited in 2013.
[3] Tretyakov, K., Machine Learning Techniques in Spam Filtering. Data Mining Problem-Oriented Seminar, (2004) 62-79.
[4] Guzella, T.S., Cominhas, W.M., A Review of Machine Learning Approaches to Spam Filtering. Published in Elsevier Journal: Expert System with Application, 36 (2009) 10206-10222.
[5] Blanzieri, E., Bryl, A., A Survay of Learning-Based Techniques of Email Spam Filtering. Published in Elsevier Journal: Artificial Intelligence Review, (2008) 63-92.
[6] Zhu, Y., Tan, Y., A Local-Concentration-Based Feature Extraction Approach for Spam Filtering. IEEE Transactions on Information Forencics and Security, 6 (2011) 486-497.
[7] Besavaraju, M., Prabhakar, R., A Novel Method of Spam Mail Detection Using Text Based Clustering Approach. Published in International Journal of Computer Applications (IJCA), 5 (2010) 15-25.
[8] E. Michelakis , I. Androutsopoulos , G. Paliouras , G. Sakkis , P. Stamatopoulos, A Learning-Based Anti-Spam Filter. Proceedings on First Conference on Email and Anti-Spam (CEAS), California, USA, (2004).
[9] A. Beiranvand, A.Osareh, B. Shadgar, Spam Filtering By Using a Compound Method of Feature Selection. Published in Journal of Academic and Applied Studies (JAAS), 2 (2012) 25-31.
[10] Chang, M., Poon, C.K., Using Phrases as Features in Email Classification. Published in Elsevier: The Journal of Systems and Softwares, 82 (2009) 1036-1045.
[11] X. Geng, T.Y. Liu, T. Qin, H. Li, Feature Selection for Ranking. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, (2007) 407-414.
[12] Yang, Y., Pederson, J.O., A Comparative Study on Feature Selection in Text Categorization. Proceedings of the 14th International Conference on Machine Learning (ICML), San Francisco, CA, USA, (1997) 412-420.
[13] Hall, M., Discriminative Multinomial Naive Bayes for Text Classification. Community Contribution: Pentaho Data Mining-Weka/DATAMINING-125, (2008).
[14] LingSpam Public Corpus, <>, Visited on 2013.
[15] A.M. Kibriya, E. Frank, B. Pfahringer, G, Holmes, Multinomial Naive Bayes for Text Categorization Revisited. Proceedings of 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, 3339 (2004) 488-499.
[16] Alpaydin, E., Introduction to Machine Learning, Second Edition. The MIT Press, (2010) 350-380.
[17] Breiman, L., Random Forests. Published in Journal of Machine Learning, MA, USA, 45 (2001) 5-32.