Content - Based Spam Filtering and Detection Algorithms - An Efficient Analysis & Comparison

  ijett-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2013 by IJETT Journal
Volume-4 Issue-9                      
Year of Publication : 2013
Authors : R.Malarvizhi , K.Saraswathi

Citation 

R.Malarvizhi , K.Saraswathi. "Content - Based Spam Filtering and Detection Algorithms - An Efficient Analysis & Comparison". International Journal of Engineering Trends and Technology (IJETT). V4(9):4237-4242 Sep 2013. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group.

Abstract

Spam is one of the major problems faced by the internet community. There are many approaches developed to overcome spam and filtering is one of the important one. The Content - based filtering is also known as cognitive filtering that recommends items based on a comparison between the content of the items and a user profile items. The content of each item is represented as a set of descriptors or terms. The terms are typically, the words that occur in a document. User profiles are represented with the same terms and built up by analyzing the content of items seen by the user. In this paper, an overview of the state of the art for spam filtering is studied and the ways of evaluation and comparison of different filt ering methods. This research paper mainly contributes to the comprehensive study of spam detection algorithms under the category of content based filtering. Then, the implemented results have been benchmarked to examine how accurately they have been classified into their original categories of spam

References

[1] MAAWG. Messaging anti - abuse working group. Email metrics repost. Third & fourth quarter 2006. Available at http://www.maawg.org/about/MAAWGMetric 2006 3 4 report.pdf Accessed: 04.06.07, 2006.
[2] Mikko Siponen and Carl Stucke. Effective anti - spam strategies in companies: An international study. In Proceedings of HICSS `06, vol 6, 2006.
[3] Khorsi A., "An Overview of Content - Based Spam Filtering Techniques ", Informatica (Slovenia), pp. 269 - 277, 2007.
[4] Robinson, G. Gary Robinson’s Rants. Available: http://www.garyrobinson.net .
[5] Robinson, G. A Statistical Approach to the Spam Problem. Linux J. 2003, 107 (2003), 3.
[6] Zdziarski, J. A. Ending Spam: Bayesian Content Filtering and The Art of Statistical Language Classification. No Starch Press, San Francisco, CA, USA, 2005.
[7] SpamBayes Development Team. Spambayes. Available: http://spambayes.sourceforge.net .
[8] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the 7th European Conference on Principles and Practice o f Knowledge Discovery in Databases , pages 107 – 119, Cavtat - Dubrovnik, Croatia, 2003.
[9] D. Margineantu and T. G. Dietterich. Pruning adaptive boosting. In Proceedings of the 14th International Conference
[10] Salton, G. 1989. Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison - Wesley.
[11] Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1 - 47.
[12] I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. S akkis, C.D. Spyropoulos, and P. Stamatopoulos. Learning to ?lter spam e - mail: A comparison of a naive bayesian and a memory - based approach. Proceedings of Workshop on Machine Learning and Textual Information Access , pages 1{13, 2000.
[13] Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naïve bayes classi ?ers with statistical language models. Information Retrieval , 7:317{345, 2004.
[14] Heron S., "Technologies for spam detection",Network Security, 2009, 1, pp. 11 - 15, 2009.
[15] M. Basavaraju, Dr. R. Prabhakar, "A Novel Method of Spam Mail Detection using Text Based Clustering Approach" ,International Journal of Computer Applications (0975 – 8887) Volume 5 – No.4, August 2010

Keywords
Spam, AdaBoost, KNN, Chi - Square, Black list, White list, Bayesian filters, Cache Architecture.