Utilizing Machine Learning for Sentiment Analysis of IMDB Movie Review Data

Utilizing Machine Learning for Sentiment Analysis of IMDB Movie Review Data

  IJETT-book-cover           
  
© 2023 by IJETT Journal
Volume-71 Issue-5
Year of Publication : 2023
Author : Ubaid Mohamed Dahir, Faisal Kevin Alkindy
DOI : 10.14445/22315381/IJETT-V71I5P203

How to Cite?

Ubaid Mohamed Dahir, Faisal Kevin Alkindy, "Utilizing Machine Learning for Sentiment Analysis of IMDB Movie Review Data," International Journal of Engineering Trends and Technology, vol. 71, no. 5, pp. 18-26, 2023. Crossref, https://doi.org/10.14445/22315381/IJETT-V71I5P203

Abstract
In this study, we focus on sentiment analysis, an essential technique in the rapidly evolving field of text analytics. Our approach involves preprocessing the movie review text data using tokenization, lemmatization techniques, and feature extraction using Word of Bags and TF-IDF. We employ three popular machine learning methods, Logistic Regression, SVM, and Random Forest, to develop sentiment classification models. Our results show that logistic regression with the TF-IDF technique and default parameters outperforms the other models in terms of minimizing false positives, with an accuracy of 89.20%, a precision of 88.80%, recall of 89.80%, and an area under the receiver operating characteristics curve (AUC) of 89%. These findings have important implications for improving sentiment analysis and developing more accurate and effective text analytics tools, contributing to the novelty of the work in the journal fields.

Keywords
Bag of Words, Logistic regression, Movie review, Precision, Random forest, Sentiment analysis, SVM, TF-IDF.

References
[1] Internet Users Worldwide 2023 - Statista, 2022. [Online]. Available: https://www.statista.com/statistics/1190263/internet- users-worldwide/
[2] Jacob R. Pentheny, “The Influence of Movie Reviews on Consumers,” Honors Theses and Capstones.
[Google Scholar] [Publisher Link]
[3] P. G. Preethi, V. Uma, and Ajith Kumar, "Temporal Sentiment Analysis and Causal Rules Extraction from Tweets for Event Prediction," Procedia Computer Science, vol. 48, pp. 84–89, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Bo Pang, and Lillian Lee, “Opinion Mining and Sentiment Analysis,” Foundations and Trends® in Information Retrieval, vol. 2, no. (1–2), pp. 1-135, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Andrew L. Maas et al., “Learning Word Vectors for Sentiment Analysis,” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142-150, 2011.
[Google Scholar] [Publisher Link]
[6] Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631-1642, 2013.
[Google Scholar] [Publisher Link]
[7] Joosung Yoon, and Hyeoncheol Kim, “Multi-Channel Lexicon Integrated CNN-Bilstm Models for Sentiment Analysis,” Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017), pp. 244-253, 2017.
[Google Scholar] [Publisher Link]
[8] Xiang Zhang, Junbo Zhao, and Yann LeCun, “Character-Level Convolutional Networks for Text Classification,” NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 1, pp. 649-657, 2015.
[Google Scholar] [Publisher Link]
[9] Afreen Jaha et al., "Text Sentiment Analysis Using Naïve Baye’s Classifier," International Journal of Computer Trends and Technology, vol. 68, no. 4, pp. 261-265, 2020.
[CrossRef] [Publisher Link]
[10] Aliaksei Severyn, and Alessandro Moschitti, “UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification,” Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 464-469, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Yassine Al Amrani, Mohamed Lazaar, and Kamal Eddine El Kadiri, “Random Forest and Support Vector Machine Based Hybrid Approach to Sentiment Analysis," Procedia Computer Science, vol. 127, pp. 511–520, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] B. Lakshmi Devi et al., "Sentiment Analysis on Movie Reviews," Emerging Research in Data Engineering Systems and Computer Communications, vol. 1054, pp. 321-328, 2020.
[CrossRef] [Publisher Link]
[13] Rüdiger Wirth, and Jochen Hipp, "Crisp-Dm: Towards a Standard Process Modell for Data Mining," Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, vol. 1, pp. 29–39, 2013.
[Google Scholar] [Publisher Link]
[14] Michael Fitzgerald, Introducing Regular Expressions, O'Reilly Media, Inc., 2012.
[Google Scholar] [Publisher Link]
[15] Irfan Alghani Khalid, Cleaning Text Data with Python, 2020. [Online]. Available: https://towardsdatascience.com/cleaning-text-data- with-python- b69b47b97b76
[16] Gunisetti Tirupathi Rao, and Dr. Rajendra Gupta, "An Approach of Clustering and Analysis of Unstructured Data," SSRG International Journal of Computer Science and Engineering, vol. 6, no. 11, pp. 64-69, 2019.
[CrossRef] [Publisher Link]
[17] Alberto Fernández et al., Learning From Imbalanced Data Sets, Springer, 2019.
[Google Scholar] [Publisher Link]
[18] Tarek Kanan et al., “A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media," 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 622-628, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Dipanjan Sarkar, Text Analytics with Python, Apress, 2019.
[Google Scholar] [Publisher Link]
[20] Okan Ozturkmenoglu, and Adil Alpkocak, "Comparison of Different Lemmatization Approaches for Information Retrieval on Turkish Text Collection," 2012 International Symposium on Innovations in Intelligent Systems and Applications, pp. 1-5, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Koushik kumar, NLP: Bag of Words and TF-IDF Explained!, 2021. [Online]. Available: https://koushik1102.medium.com/nlp-bag-of-words- and-tf-idfexplained-fd1f49dce7c4
[22] Prafulla Mohapatra et al., "Sentiment Classification of Movie Review and Twitter Data Using Machine Learning," International Journal of Computer and Organization Trends, vol. 9, no. 3, pp. 1-8, 2019.
[CrossRef] [Publisher Link]
[23] M. Borcan, TF-IDF Explained and Python Sklearn Implementation, 2020. [Online]. Available: https://towardsdatascience.com/tf-idf- explainedand-python-sklearn-implementation-b020c5e83275
[24] M. Sheykhmousa et al., "Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 6308-6325, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Sourish Ghosh, Anasuya Dasgupta, and Aleena Swetapadma, "A Study on Support Vector Machine Based Linear and Nonlinear Pattern Classification," 2019 International Conference on Intelligent Sustainable Systems (ICISS), pp. 24-28, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Duyu Tang, Bing Qin, and Ting Liu, “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422-1432, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Christopher D. Manning, and Prabhakar Raghavan, and Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press, 2008.
[Google Scholar] [Publisher Link]
[28] S. Arafin Mahtab, N. Islam, and M. Mahfuzur Rahaman, “Sentiment Analysis on Bangladesh Cricket with Support Vector Machine," 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1-4, 2018.
[CrossRef] [Google Scholar] [Publisher Link]