A Content-Based Spam E-Mail Filtering Approach Using Multilayer Percepton Neural Networks

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2016 by IJETT Journal
Volume-41 Number-1
Year of Publication : 2016
Authors : A.Sesha Rao, P.S.Avadhani, Nandita Bhanja Chaudhuri
DOI :  10.14445/22315381/IJETT-V41P210

Citation 

A.Sesha Rao, P.S.Avadhani, Nandita Bhanja Chaudhuri"A Content-Based Spam E-Mail Filtering Approach Using Multilayer Percepton Neural Networks", International Journal of Engineering Trends and Technology (IJETT), V41(1),44-55 November 2016. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
Nowadays increased spam e-mails are causing inconvenience to internet users and organizations and are considered as a serious wastage of resources, time, memory, space and efforts. Therefore, it is crucial to have an automatic e-mail classification system for the identification of spam e-mails. Spam mails need to be classified and separated from ham (non-spam) mails as they are the source of financial loss and annoyance for the recipients. The spam e-mail classifier performance can be greatly enhanced with the use of Artificial Neural Network classification. It has capability of learning huge amount of data with high dimensionality in a better way. In this paper, Multilayer Perceptron and Back Propagation Training algorithm is explored where ‘generalized delta’ rule is used for weight adjustments for hidden layers. The Perceptron uses Back Propagation Learning model for calculating its gradient. For fast convergence the learning rate ? is changed for every iteration which is proportional to the negative gradient of the instantaneous error with respect to ?. To avoid the local minima problem the weights are initialized to small random numbers which are uniformly distributed in the range [ -?/ , + ?/ ], where Ni is the number of inputs, and ? takes value in (1, 3). In this paper, four Multilayer Perceptron (MLP) Network models are constructed. For testing our model bench mark data drawn from UCI, Machine learning Repository is employed for training the neural network. The results of our MLP model are reasonable in terms of TP rate, FP Rate, Accuracy, Precision, Recall, F-measure, MCC, ROC Area, PRC Area.

 References

[1] Christina V. et al, “A Study on Email Spam Filtering Techniques”, International Journal of Computer Application, Dec 2010, Vol. 12, pp. 7-9.
[2].Priyanka Sao, Pro. Kare Prashanthi, "E-mail Spam Classification Using Naïve Bayesian Classifier", International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), June 201, Vol. 4 Issue 6., pp 2792 - 2796.
[3].Anjali Sharma, Manisha, Dr. Manisha, Dr. Rekha Jain, "Unmasking Spam in Email Messages", International Journal of Advanced Research in Computer and Communication Engineering, Feb. 2015, Vol. 4, Issue 2, pp 35 - 39.
[4].Shuang Hao," A Thesis on Early Detection of Spam-related Activity", Georgia Institute of Technology, Dec 2014.
[5]."MAAWG email metrics program: The network operators perspective, report 15." http://www.maawg.org/sites/maawg/files/news/MAAWG_2011_ Q1Q2Q3_Metrics_Report_15.pdf, 2011.
[6]."Email spam record activity" http://www.guardian.co.uk/technology/ 2011/jan/10/email-spam-record-activity, 2011.
[7]."Symantec Internet security threat report" http://www.symantec.com/content/en/us/enterprise/other_resources/b istr_main_report_ v18_2012_21291018.en-us.pdf, 2013.
[8].Microsoft releases new threat data on Rustock." http://blogs.microsoft. com/blog/microsoft-releases-new-threat-data-on-rustock/, 2011.
[9] "Email Statistics Report, 2015-2019", The Radicati Group, Inc. A Technology Market Research Firm Palo Alto, CA, USA, http://www.radicati.com.
[10] Sujeet More, Dr S A Kulkarni, " Data Mining With Machine Learning Applied for Email Deception", IEEE Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, Tamil Nadu, India, July 2-3, 2013.
[11] H.S. Hota, et al, “Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection”, International Journal of Computer Applications, April 2013, Vol. 67, pp.20-25.
[12] Herrero, A., Snasel, et al, “Combined Classifiers with Neural Fuser for Spam Detection”. [Advances in Intelligent Systems and Computing] International Joint Conference CISIS12-ICEUTE12-SOCO12 Special Sessions, 2013, Vol.189.
[13] Idris, I, Selamat, A. et al, “A Combined Negative Selection Algorithm-Particle Swarm Optimization for an Email Spam Detection System”, Engineering Applications of Artificial Intelligence, 2015, vol.39, pp.33-44.
[14].Simon, D. “Biogeography-Based Optimization”,IEEE Transactions on Evolutionary Computation, 2008, vol.12, pp.702-713.
[15] Ali Rodan, et al, “Optimizing Feed forward Neural Networks Using Biogeography Based Optimization for E-Mail Spam Identification”, Int. Jl. of Communications, Network and System Sciences, 2016, vol. 9, pp. 19-28.
[16] Ren WangI, Amr M. Youssef , Ahmed K. Elhakeem, "On Some Feature Selection Strategies for Spam Filter Design", The 2006 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE/CCGEI),Ottawa, May 2006, pp. 2186 - 2189.

Keywords
back propagation, delta rule, F-measure, hidden layers, learning rate, local minima, MCC, perceptron, precision, PRC Area, recall, ROC.