A Review on Cyberstalking Detection Using Machine Learning Techniques: Current Trends and Future Direction

A Review on Cyberstalking Detection Using Machine Learning Techniques: Current Trends and Future Direction

© 2022 by IJETT Journal
Volume-70 Issue-3
Year of Publication : 2022
Authors : Arvind Kumar Gautam, Abhishek Bansal

How to Cite?

Arvind Kumar Gautam, Abhishek Bansal, "A Review on Cyberstalking Detection Using Machine Learning Techniques: Current Trends and Future Direction," International Journal of Engineering Trends and Technology, vol. 70, no. 3, pp. 95-107, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I3P211

Web-based media organizations and other web applications, for example, WhatsApp, Facebook, YouTube, Instagram, Twitter, have become more well known among individuals for data sharing, live occasions, news, exposure, publicity, and cybercrimes. The utilization of online media stages additionally offers major issues through cyberstalking, cyberbullying, and different kinds of digital provocation. Cyberstalking and cyberbullying are frequently utilized reciprocally and include the utilization of the web to follow or target somebody in the web-based world. Cyberstalking is a basic worldwide issue that influences instructive foundations, casualties, and the whole human culture that should be distinguished, recognized, revealed, and controlled appropriately for the security of clients in online media. Machine learning is the most well-known method for making the cyberstalking recognition model. Researchers have recommended different recognition procedures utilizing machine learning to control and battle cyberstalking in web-based media. In this paper, the study relates to some popular features extraction methods machine learning classifiers for text classification and explores the datasets used by the researchers. The study also focuses on reasonably determining the research gaps and the scope for improving cyberstalking detection. This paper will review some cyberstalking detection techniques using machine learning, analyze the performance of popular machine learning classifiers and finally explore the issues, challenges, recent trends, and future direction for cyberstalking detection techniques.

Machine learning, Cyberstalking detection, Cyberbullying, Features extraction, Word embedding.

[1] (2021) The Statistics website [Online]. Available: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/[2] P. E. Mullen, M. Pathé, R. Purcell, Stalking: New constructions of human behaviour, Australian and New Zealand Journal of Psychiatry, 35 (2021) 9–16.
[3] E. Short, T. Stanley, M., Baldwin, G. G. Scott, Behaving Badly Online: Establishing Norms of Unacceptable Behaviours, Studies in Media and Communication, 3(1) (2015) 1-10.
[4] (2015) The BBC News Website. [Online]. Available: https://www.bbc.com/news/world-asia-india-33532706
[5] M. Baer, Cyberstalking and the Internet Landscape We Have Constructed, Virginia Journal of Law & Technology, 154(15) (2020) 153-227.
[6] J. L. Truman, Examining intimate partner stalking and use of technology in stalking victimization. Ph.D. thesis, University of Central Florida Orlando, Florida, (2010).
[7] D. A. Jurgens, P. D. Turney, and K. J. Holyoak, SemEval-2012 Task 2: Measuring Degrees of Relational Similarity, First Joint Conference on Lexical and Computational Semantics, paper 1 (2012) 356–364.
[8] N. Parsons-pollard and L. J. Moriarty, Cyberstalking: Utilizing What We do Know, Victims and Offenders, 4(4) (2009) 435–441.
[9] Gautam, Arvind Kumar, and Abhishek Bansal. Performance Analysis of Supervised Machine Learning Techniques For Cyberstalking Detection In Social Media, Journal of Theoretical and Applied Information Technology 100(2) (2022).
[10] N. M. Zainudin, K.H. Zainal, N. A. Hasbullah, N. A. Wahab, S. Ramli, A review on cyberbullying in Malaysia from a digital forensic perspective, International Conference on Information and Communication Technology (ICICTM), IEEE, paper (2016) 246-250.
[11] (2017) The Analytics Vidhya website. [Online]. Available: https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
[12] (2017) The Towards Data Science website. [Online] Available: https://towardsdatascience.com/types-of-machine-learning-algorithms-you-should-know-953a08248861
[13] W. A. Al-Khater, S. Al-Maadeed, A. A. Ahmed, A. S. Sadiq and M. K. Khan, Comprehensive Review of Cybercrime Detection Techniques, in IEEE Access, 8 (2020) 137293-137311.
[14] Tolba, Marwa, Salima Ouadfel, and Souham Meshoul, Hybrid ensemble approaches to online harassment detection in highly imbalanced data, Expert Systems with Applications 175 (2021) 114751.
[15] Sadiq, S., Mehmood, A., Ullah, S., Ahmad, M., Choi, G. S., and B. W. Aggression detection through the deep neural model on Twitter, Future Generation Computer Systems, 114, (2021) 120-129.
[16] Ayo, F. E., Folorunso, O., Ibharalu, F. T., Osinuga, I. A., and Abayomi-Alli, A, A probabilistic clustering model for hate speech classification in Twitter, Expert Systems with Applications, 173, (2021) 114762.
[17] Bini, Stefano A. Artificial intelligence, machine learning, deep learning, and cognitive computing: what do these terms mean and how will they impact health care? The Journal of arthroplasty 33(8) (2018) 2358-2361.
[18] Singhal, Paridhi, and Ashish Bansal Improved textual cyberbullying detection using data mining, International Journal of Information and Computation Technology 3(6) (2013) 569-576.
[19] A. Derhab, A. Bouras, F. B. muhaya, M. K. Khan and Y. Xiang, Spam trapping system: Novel security framework to fight against spam botnets, Proc. 21st Int. Conf. Telecommun. (ICT), paper (2014) 467-471.
[20] J. F. Peters, Foundations of Computer Vision: Computational Geometry Visual Image Structures and Object Shape Detection, Berlin, Germany: Springer, (2017).
[21] K. Dinakar, Modeling the Detection of Textual Cyberbullying, Proceedings of the International AAAI Conference on Web and Social Media, paper 5(3) (2011) 11–17.
[22] Kelly Reynolds, April Kontostathis, Lynne Edwards, Using Machine Learning to Detect Cyberbullying, IEEE 10th International Conference on Machine Learning and Applications, paper 2 (2011) 241–244.
[23] M. Dadvar, R. Ordelman, F.D. Jong, D.Trieschnigg, Towards User Modelling in the Combat against Cyberbullying, in Natural Language Processing and Information Systems, Springer-Verlag Berlin Heidelberg, paper (2012) 277–283.
[24] Maral Dadvar, Franciska de Jong, Roeland Ordelman, Dolf Trieschnigg, Improved Cyberbullying Detection Using Gender Information, 12th -Dutch-Belgian Information Retrieval Workshop, paper (2012) 693–696.
[25] Vinita Nahar, Sayan Unankard, Xue Li, Caoyi Pang, Sentiment Analysis for Effective Detection of Cyber Bullying, Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications, paper (2012).
[26] Vinita Nahar, Xue Li, Chaoyi Pang, Yang Zhang, Cyberbullying Detection based on Text-Stream Classification, Proceedings of the 11th Australasian Data Mining Conference, Canberra, Australia, paper (2013).
[27] Maral Dadvar, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong, Improving cyberbullying detection with user context, Proceedings of the European Conference on Information Retrieval, paper (2013) 693-696.
[28] Vivek K. Singh, Qianjia Huang, Pradeep K. Atrey, Cyber Bullying Detection Using Social and Textual Analysis, 3rd International Workshop on Socially-Aware Multimedia - SAM, paper (2014).
[29] Cicero Nogueira, Dos Santos, MairaGatti, Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts, International Conference on Computational Linguistics, paper (2014).
[30] Z. Ghasem, I. Frommholz, and C. Maple, Machine learning solutions for controlling cyberbullying and cyberstalking, International Journal of Security, 6(2) (2015) 55-64.
[31] B Nandhini and JI Sheeba, Cyberbullying detection and classification using information retrieval algorithm, International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015), paper (2015) 20.
[32] Vikas S Chavan and SS Shylaja, Machine learning approach for detection of cyber-aggressive comments by peers on social media network, In Advances in computing, communications and informatics (ICACCI), 2015 IEEE International Conference on, paper (2015) 2354–2358.
[33] Ingo Frommholz, Haider M. al-Khateeb, Martin Potthast, Zinnar Ghasem, Mitul Shukla , Emma Short, On Textual Analysis and Machine Learning for Cyberstalking Detection, Datenbank Spektrum, 16 (2016) 127–135.
[34] Michele Di Capua, Emanuel Di Nardo, Alfredo Petrosino, Unsupervised Cyber Bullying Detection in Social Networks, Proceedings of the 23rd International Conference on Pattern Recognition (ICPR) Cancún Center, México, paper (2016).
[35] Vivek K. Singh, Qianjia Huang, Pradeep K. Atrey, Cyberbullying detection using probabilistic socio-textual information fusion, Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASNAM), paper (2016).
[36] M. Ganesan, P. Mayilvahanan, Cyber Crime Analysis in Social Media Using Data Mining Technique, International Journal of Pure and Applied Mathematics, 116(22) (2017) 413–424.
[37] Walisa Romsaiyud, Kodchakorna Nakornphanom, Pimpaka Prasertsilp, Piyaporn Nurarak, and Pirom Konglerd, Automated cyberbullying detection using clustering appearance patterns, In Knowledge and Smart Technology (KST), IEEE 9th International Conference on, paper (2017) 242–247.
[38] Sani Muhamad Isa, Livia Ashianti, Cyberbullying classification using text mining, In Informatics and Computational Sciences (ICICoS), IEEE 1st International Conference on, paper (2017) 241–246.
[39] Hitesh Kumar Sharma, K Kshitiz, Shailendra, NLP and Machine Learning Techniques for Detecting Insulting Comments on Social Networking Platforms, Proceedings of the International Conference on Advances in Computing and Communication Engineering (ICACCE), Paris, France, paper (2018).
[40] Sweta Agrawal, Amit Awekar, Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms, Springer International Publishing AG, part of Springer Nature, (2018) 141–153.
[41] Amanpreet Singh, Maninder Kaur, Content-based Cybercrime Detection: A Concise Review, International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8(8) (2019) 1193-1207.
[42] JI Sheeba, S. Pradeep Devaneyan, Revathy Cadiravane, Identification and Classification of Cyberbully Incidents using Bystander Intervention Model, International Journal of Recent Technology and Engineering (IJRTE) 254(8) (2019).
[43] John Hani Mounir, Mohamed Nashaat, Mostafaa Ahmed, Eslam A. Amer, Social Media Cyberbullying Detection using Machine Learning, International Journal of Advanced Computer Science and Applications, 10(5) (2019).
[44] V. Balakrishnan, S. Khan, H.R. Arabnia, Improving cyberbullying detection using Twitter users` psychological features and machine learning, Science Direct, ELSEVIER, Computer & Security, 90 (2020) 101710.
[45] Manowarul Islam, Selina Sharmin, Cyberbullying Detection on Social Networks Using Machine Learning Approaches, IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), paper (2020).
[46] Amgad Muneer, Suliman Mohamed Fati, A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter, Future Internet, 187(12) (2020).
[47] Anant Khandelwal and N. Kumar, A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts, in Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, paper (2020) 55–64.
[48] A. Asante and X. Feng, Content-Based Technical Solution for Cyberstalking Detection, 3rd International Conference on Computer Communication and the Internet (ICCCI), paper (2021) 89-95.
[49] N. Dughyala, S. Potluri, S. KJ and V. Pavithran, Automating the Detection of Cyberstalking, 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), paper (2021) 887-892.
[50] Doppala B.P., NagaMallik Raj S., Stephen Neal Joshua E., Thirupathi Rao N., Automatic Determination of Harassment in Social Network Using Machine Learning. In: Saha S.K., Pang P.S., Bhattacharyya D., Smart Technologies in Data Science and Communication, Lecture Notes in Networks and Systems, Springer, Singapore, paper 210 (2021).
[51] Jain, V., Kumar, V., Pal, V., & Vishwakarma, D. K., Detection of Cyberbullying on Social Media Using Machine learning, 5th International Conference on Computing Methodologies and Communication (ICCMC), IEEE, paper (2021) 1091-1096.
[52] Pericherla, Subbaraju, and E. Ilavarasan, Performance analysis of Word Embeddings for Cyberbullying Detection. IOP Conference Series: Materials Science and Engineering, paper 1085(1) (2021).
[53] Raj, Chahat, et al. Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques, Electronics 22(10) (2021) 2810.
[54] Vijayarani, S., Ms J. Ilamathi, and Ms Nithya, Pre-processing techniques for text mining-an overview, International Journal of Computer Science & Communication Networks 5(1) (2015) 7-16.
[55] (2018) The Towards Data Science website. [Online] Available: https://towardsdatascience.com/all-you-need-to-know-about-text-preprocessing-for-nlp-and-machine-learning-bc1c5765ff67.
[56] Kadhim, Ammar Ismael, An evaluation of pre-processing techniques for text classification, International Journal of Computer Science and Information Security (IJCSIS) 16(6) (2018) 22-32.
[57] Rui, Weikang, Kai Xing, and Yawei Jia, BOWL: Bag of word clusters text representation using word embeddings, International Conference on Knowledge Science, Engineering and Management. Springer, Cham, paper (2016).
[58] (2020) Medium website. [Online]. Available: https://medium.com/@kashyapkathrani/all-about-embeddings-829c8ff0bf5b
[59] B.Das and S. Chakraborty, An improved text sentiment classification model using TF-IDF and next word negation, arXiv preprint arXiv: 1806.06407, (2018).
[60] Anand, Mukul, and R. Eswari, Classification of abusive comments in social media using deep learning, 3rd international conference on computing methodologies and communication (ICCMC). IEEE, paper (2019).
[61] K. Wang, Y. Cui, J. Hu, Y. Zhang, W. Zhao, & L. Feng, Cyberbullying detection based on the fast text and word similarity schemes, ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), paper 20(1) (2020) 1-15.
[62] Swamy, Steve Durairaj, Anupam Jamatia, and Björn Gambäck. Studying generalisability across abusive language detection datasets. Proceedings of the 23rd conference on computational natural language learning (CoNLL), paper (2019).
[63] Hoang Tran, Loc, Tuan Tran, and An Mai, Text classification problems via BERT embedding method and graph convolutional neural network, arXiv e-prints, (2021) 2111.
[64] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, & R. Soricut, Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).
[65] K. Clark, M.T. Luong, Q.V. Le, & C. D. Manning, Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).
[66] Ethayarajh, Kawin. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings, arXiv preprint arXiv:1909.00512 (2019).
[67] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, & Q.V. Le, Xlnet: Generalized autoregressive pre-training for language understanding, Advances in neural information processing systems, 32 (2019).
[68] Liu, Yinhan, et al. Roberta: A robustly optimized bert pre-training approach. arXiv preprint arXiv: 1907.11692 (2019).
[69] Reimers, Nils, and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv: 1908.10084 (2019).
[70] S. Tomkins, L. Getoor, Y. Chen, & Y. Zhang, A socio-linguistic model for cyberbullying detection, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, (2018) 53-60.
[71] Modha, Sandip, Prasenjit Majumder, and Thomas Mandl, An empirical evaluation of text representation schemes to filter the social media stream, Journal of Experimental & Theoretical Artificial Intelligence, (2021) 1-27.
[72] Paul, Sayanta, Sriparna Saha, and Mohammed Hasanuzzaman. Identification of cyberbullying: A deep learning-based multimodal approach. Multimedia Tools and applications (2020) 1-20.
[73] Samir Kumar Bandyopadhyay, Payal Bose, Amiya Bhaumik, Sandeep Poddar, Machine Learning and Deep Learning Integration for Skin Diseases Prediction International Journal of Engineering Trends and Technology 70(2) (2022) 11-18.
[74] Parita Shah, Priya Swaminarayan, Maitri Patel, Nimisha Patel, Sentiment Analysis on Movie Reviews in Regional Language Gujarati Using Machine Learning Algorithm, International Journal of Engineering Trends and Technology 70(1) (2022) 313-326.
[75] FY. Osisanwo, JE. Akinsola, O. Awodele, JO. Hinmikaiye, O. Olakanmi, J. Akinjobi, Supervised machine learning algorithms: classification and comparison, International Journal of Computer Trends and Technology (IJCTT), 48(3) (2017) 128-138.
[76] S. Ray, A Quick Review of Machine Learning Algorithms, International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), paper (2019) 35-39.
[77] Mahesh, Batta. Machine Learning Algorithms-A Review, International Journal of Science and Research (IJSR), 9 (2020) 381-386.
[78] Breiman L. Random forests. Machine learning, 45(1) (2001) 5-32
[79] Govindan, Vithyatheri, and Vimala Balakrishnan, A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for sarcasm detection, Journal of King Saud University-Computer and Information Sciences (2022).
[80] Gumaei, Abdu, et al. An effective approach for rumour detection of Arabic tweets using extreme gradient boosting method. Transactions on Asian and Low-Resource Language Information Processing 21(1) (2022) 1-16.
[81] Li, Bin, Qingzhao Yu, and Lu Peng. Ensemble of fast learning stochastic gradient boosting, Communications in Statistics-Simulation and Computation 51(1) (2022) 40-52.
[82] C. Raj, A. Agarwal, G. Bharathy, B. Narayan, & M. Prasad, Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques, Electronics, 22(10) (2021) 2810.
[83] (2020) Cyberbullying datasets - Mendeley website. [Online]. Available: https://data.mendeley.com/datasets/jf4pzyvnpj/1 [84] (2020) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset
[85] (2022) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/andrewmvd/cyberbullying-classification
[86] (2021) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/sanamps/toxiccommentclassification
[87] (2014) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/c/detecting-insults-in-social-commentary/data