A Survey of Various Machine Learning Techniques for Text Classification

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)
© 2014 by IJETT Journal
Volume-15 Number-6
Year of Publication : 2014
Authors : Gaurav S. Chavan , Sagar Manjare , Parikshit Hegde , Amruta Sankhe
  10.14445/22315381/IJETT-V15P255

Citation 

Gaurav S. Chavan , Sagar Manjare , Parikshit Hegde , Amruta Sankhe. "A Survey of Various Machine Learning Techniques for Text Classification", International Journal of Engineering Trends and Technology (IJETT), V15(6),288-292 Sep 2014. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract

Sentiments are expressions of one’s words in a sentence. Hence understanding the meaning of text in the sentence is of utmost importance to people of various fields like customer reviews in companies, movie reviews in movies, etc. It may involve huge text data to analyze and it becomes totally unviable for manually understanding the meaning of sentences. Classifier algorithms should be used to classify the various meaning of the sentences. By using pre-defined data to train our classifier and three different algorithms namely Naive Bayes, Support Vector Machines, Decision Trees, we can simplify the task of text classification. Using relevant results and examples we will prove that SVM is one of the better algorithms in providing higher accuracy over the other two algorithms i.e. Naive Bayes and Decision Tree.

References

1. Bo Pang and Lillian Lee, Shivakumar Vaithyanathan. “Thumbs up? Sentiment Classi?cation using Machine Learning Techniques”. Appears in Proc. 2002 Conf. on Empirical Methods in Natural Language Processing (EMNLP)
2. Fabrizio Sebastiani. “Machine Learning in Automated Text Categorization”.
3. Vladimir Vapnik(1995) “Support-Vector Networks. AT&T Bell Labs., Hohndel, NJ 07733, USA.
4. A. Basu, C. Watters, and M. Shepherd(2002).” Support Vector Machines for Text Categorization.Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS’03).”
5. P. Domingos and M. J. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning,vol. 29, nos. 2/3, pp. 103-130, 1997.
6. Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng(2006). “Some Effective Techniques for Naive Bayes Text Classification”. (Knowledge and Data Engineering, IEEE Transactions on volume 18, issue 11, 2006)
7. Fabrice Colas and Pavel Brazdil. “Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks.”
8. Joachims, T. “Text categorization with support vector machines: Learning with many relevant features.” European Conference on Machine Learning (ECML), 1998.
9. Susan Dumais. “Using SVM for text categorization. (Decision Theory and Adaptive Systems Group Microsoft Research)”
10. S. Rasoul Safavian and David Landgrebe."A survey of Decision Tree methodology".
11. Daniela XHEMALI, Christopher J. HINDE and Roger G. STONEIJCSI. International Journal of Computer Science Issues, Vol. 4, No. 1, 2009
12. Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
13. Sentiment Analysis: A Literature Survey by Subhabrata Mukherjee- IIT-Bombay
14. Kwok, J.T-K. (1998)” Automated Text Categorization Using Support Vector Machine.” Proceedings of the International Conference on Neural Information Processing (ICONIP).
15. Rennie, J.D.M. and R. Rifkin. (2001). “Improving Multiclass Text Classification with the Support Vector Machine.”,May 23, 2002
16. http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html
17. https://cloud.google.com/prediction/docs/sentiment_analysis
18. http://en.wikipedia.org/wiki/C4.5_algorithm

Keywords
Text Classification, Sentiment Analysis, Algorithms, Naive Bayes, SVM, Decision Tree