International Journal of Engineering
Trends and Technology

Research Article | Open Access | Download PDF
Volume 74 | Issue 3 | Year 2026 | Article Id. IJETT-V74I3P105 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I3P105

Severity Detection of Cyberbullying in Saudi-Dialect Tweets: A Machine-Learning Approach


Bader Azi Alanazi, Chin-Teng Lin

Received Revised Accepted Published
06 Mar 2026 24 Jan 2026 29 Jan 2026 28 Mar 2026

Citation :

Bader Azi Alanazi, Chin-Teng Lin, "Severity Detection of Cyberbullying in Saudi-Dialect Tweets: A Machine-Learning Approach," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 3, pp. 54-74, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I3P105

Abstract

Social media platforms such as Twitter (known as X) have become channels for global communication, but have also led to an increase in cyberbullying, which carries serious psychological risks. Although much existing research has focused on detecting cyberbullying in English, there is an apparent lack of studies addressing this issue in Arabic, particularly for severity classification. This study aims to evaluate machine learning classifiers trained on balanced, pre-processed Saudi dialect data for four-level cyberbullying severity detection (non-cyberbullying, low, medium, and high) and to assess the impact of systematic class balancing on minority class performance. The study applied Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers, using Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction. A dataset of 5,819 Saudi-dialect tweets was annotated into four severity categories and evaluated across 28 experimental scenarios combining different pre-processing tools (CAMeL, NLTK, Araby) and balancing techniques (random insertion, random oversampling, synonym replacement). The highest accuracy of 92.23% was achieved using BoW+SVM with NLTK pre-processing and stop word removal, representing a 27.43% absolute improvement over the imbalanced baseline of 64.80% accuracy. Random oversampling proved to be the most effective, accounting for 96-99% of the performance gains. Per-class F1-scores ranged from 0.88 (low severity) to 0.95 (high severity and non-cyberbullying), providing further evidence of the importance of balanced training data for achieving reliable performance across all severity levels. To the best of the authors’ knowledge, this is the first study to implement four-class cyberbullying severity detection for Saudi dialect tweets.

Keywords

Text Classification, Machine Learning, Cyberbullying Detection, Arabic social media, Saudi dialect, Support Vector Machine(SVM), Naïve Bayes (NB).

References

[1] Saudi Arabia Social Media Statistics 2024, Global Media Insight - Dubai Digital Interactive Agency, 2023. [Online]. Available: https://www.globalmediainsight.com/blog/saudi-arabia-social-media-statistics/ 

[2] Number of users of twitter in Saudi Arabia 2019-2028, Statista Research Department, 2025. [Online]. Available:   https://www.statista.com/statistics/558404/number-of-twitter-users-in-saudi-arabia/ 

[3] Fadia S. AlBuhairan et al., “Time for an Adolescent health Surveillance System in Saudi Arabia: Findings from “Jeeluna”,” Journal of Adolescent Health, vol. 57, no. 3, pp. 263-269, 2015.
[
CrossRef] [Google Scholar] [Publisher Link]

[4] Monirah Abdullah Al-Ajlan, and Mourad Ykhlef, “Deep Learning Algorithm for Cyberbullying Detection,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 9, pp. 199-205, 2018.
[
CrossRef] [Google Scholar] [Publisher Link]

[5] A.K. Jaithunbi et al., “Detecting Twitter Cyberbullying using Machine Learning,” Annals of the Romanian Society for Cell Biology, vol. 25, no. 4, pp. 16307-16315, 2021.
[
Google Scholar] [Publisher Link]

[6] Raju Kumar, and Aruna Bhat, “A Study of Machine Learning-based Models for Detection, Control, and Mitigation of Cyberbullying in Online Social Media,” International Journal of Information Security, vol. 21, no. 6, pp. 1409-1431, 2022.
[
CrossRef] [Google Scholar] [Publisher Link]

[7] Monirah A. Al-Ajlan, and Mourad Ykhlef, “Optimized Twitter Cyberbullying Detection based on Deep Learning,” 2018 21st Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, pp. 1-5, 2018.
[
CrossRef] [Google Scholar] [Publisher Link]

[8] Alanoud Mohammed Alduailaj, and Aymen Belghith, “Detecting Arabic Cyberbullying Tweets Using Machine Learning,” Machine Learning and Knowledge Extraction, vol. 5, no. 1, pp. 29-42, 2023.
[
CrossRef] [Google Scholar] [Publisher Link]

[9] Djedjiga Mouheb et al., “Detection of Arabic Cyberbullying on Social Networks using Machine Learning,” 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, pp. 1-5, 2019.
[
CrossRef] [Google Scholar] [Publisher Link]

[10] Deema Alghamdi et al., “Automatic Detection of Cyberbullying and Threatening in Saudi Tweets using Machine Learning,” International Journal of Advanced and Applied Sciences, vol. 8, no. 10, pp. 17-25, 2021.
[
CrossRef] [Google Scholar] [Publisher Link]

[11] Sourabh Parime, and Vaibhav Suri, “Cyberbullying Detection and Prevention: Data Mining and Psychological Perspective,” 2014 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2014], Nagercoil, India, pp. 1541-1547, 2014.
[
CrossRef] [Google Scholar] [Publisher Link]

[12] Marilyn Campbell, and Sheri Bauman, Cyberbullying: Definition, Consequences, Prevalence, Reducing Cyberbullying in Schools: International Evidence-based Best Practices, Academic Press, pp. 3-16, 2018.
[
CrossRef] [Google Scholar] [Publisher Link]

[13] Vikas S. Chavan, and S.S. Shylaja, “Machine Learning Approach for Detection of Cyber-Aggressive Comments by Peers on Social Media Network,” 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, pp. 2354-2358, 2015.
[
CrossRef] [Google Scholar] [Publisher Link]  

[14] Andreas König, Mario Gollwitzer, and Georges Steffgen, “Cyberbullying as an Act of Revenge?,” Journal of Psychologists and Counsellors in Schools, vol. 20, no. 2, pp. 210-224, 2010.
[
CrossRef] [Google Scholar] [Publisher Link]  

[15] Peter K. Smith et al., “Cyberbullying: Its Nature and Impact in Secondary School Pupils,” Journal of Child Psychology and Psychiatry, vol. 49, no. 4, pp. 376-385, 2008.
[
CrossRef] [Google Scholar] [Publisher Link]  

[16] Sydney L. Brunecz, “More Harm than Good? Why Schools Who Take a Zero-Tolerance Stance on Cyberbullying Cause More Problems than Solutions,” Case Western Reserve Journal of Law, Technology & the Internet, vol. 6, no. 1, pp. 13-42, 2014.
[
Google Scholar] [Publisher Link]  

[17] Allison Paolini, “Cyberbullying: Role of the School Counselor in Mitigating the Silent Killer Epidemic,” International Journal of Educational Technology, vol. 5, no. 1, pp. 1-8, 2018.
[
Google Scholar] [Publisher Link]  

[18] Ye Zhang Pogue, The Digital Dagger: The Destructive Impact of Cyberbullying, Psychology Today, 2023. [Online]. Available: https://www.psychologytoday.com/us/blog/the-human-identity/202307/the-digital-dagger-the-destructive-impact-of-cyberbullying?msockid=2f3d6626b97162203f4d74a4bd716cea

[19] Ditch the Label, Cyberbullying Statistics: What They Tell Us, Ditch the Label Youth Charity, 2017. [Online]. Available: https://www.ditchthelabel.org/cyber-bullying-statistics-what-they-tell-us

[20] Deborah Goebert et al., “The Impact of Cyberbullying on Substance Use and Mental Health in A Multiethnic Sample,” Maternal and Child Health Journal, vol. 15, no. 8, pp. 1282-1286, 2011.
[
CrossRef] [Google Scholar] [Publisher Link]  

[21] Tanya Beran, and Qing Li, “The Relationship between Cyberbullying and School Bullying,” The Journal of Student Wellbeing, vol. 1, no. 2, pp. 16-33, 2007.
[
CrossRef] [Google Scholar] [Publisher Link]

[22] Justin W. Patchin, Sameer Hinduja, Summary of Our Cyberbullying Research (2007-2025), Cyberbullying Research Center, 2024. [Online]. Available: https://cyberbullying.org/summary-of-our-cyberbullying-research

[23] Victoria Brown, Elizabeth Clery, and Christopher Ferguson, “Estimating the Prevalence of Young People Absent from School Due to Bullying,” National Centre for Social Research, 2011.
[
Google Scholar]

[24] Ainoa Mateu et al., “Cyberbullying and Post-Traumatic Stress Symptoms in UK Adolescents,” Archives of Disease in Childhood, vol. 105, no. 10, pp. 951-956, 2020.
[
CrossRef] [Google Scholar] [Publisher Link]

[25] Njoud Alrasheed et al., “Prevalence and Risk Factors of Cyberbullying and its Association with Mental Health among Adolescents in Saudi Arabia,” Cureus, vol. 14, no. 12, pp. 1-10, 2022.
[
CrossRef] [Google Scholar] [Publisher Link]

[26] Gassem Gohal et al., “Prevalence and Related Risks of Cyberbullying and its Effects on Adolescent,” BMC psychiatry, vol. 23, no. 1, pp 1-10, 2023.
[
CrossRef] [Google Scholar] [Publisher Link]

[27] Nawal A. Alissa, and Rawan Abu Shryei, “Cyberbullying among Female College Students in Saudi Arabia,” International Journal of Child, Youth and Family Studies, vol. 16, no. 1, pp. 52-66, 2025.
[
CrossRef] [Google Scholar] [Publisher Link]

[28] Damian Maher, “Cyberbullying: An Ethnographic Case Study of One Australian Upper Primary School Class,” Youth Studies Australia, vol. 27, no. 4, pp. 50-57, 2008.
[
Google Scholar] [Publisher Link]

[29] Batoul Haidar, Maroun Chamoun, and Fadi Yamout, “Cyberbullying Detection: A Survey on Multilingual Techniques,” 2016 European Modelling Symposium (EMS), Pisa, Italy, pp. 165-171, 2016.
[
CrossRef] [Google Scholar] [Publisher Link]  

[30] Samaneh Nadali et al., “A Review of Cyberbullying Detection: An Overview,” 2013 13th International Conference on Intellient Systems Design and Applications, Salangor, Malaysia, pp. 325-330, 2013.
[
CrossRef] [Google Scholar] [Publisher Link]    

[31] Norulzahrah Mohd Zainudin et al., “A Review on Cyberbullying in Malaysia from Digital Forensic Perspective,” 2016 International Conference on Information and Communication Technology (ICICTM), Kuala Lumpur, Malaysia, pp. 246-250, 2016.
[
CrossRef] [Google Scholar] [Publisher Link]

[32] Nancy E. Willard, Cyberbullying and Cyberthreats: Responding to the Challenge of Online Social Aggression, Threats, and Distress, Research press, 2025.
[
Google Scholar] [Publisher Link]

[33] Jennifer Bayzick, April Kontostathis, and Lynne Edwards, “Detecting the Presence of Cyberbullying using Computer Software,” WebSci Conference, Koblenz, Germany, pp. 1-2, 2011.
[
Google Scholar]

[34] Taeho Jo, Machine Learning Foundations, Supervised, Unsupervised, and Advanced Learning, Springer Cham, 2021.
[
CrossRef] [Google Scholar] [Publisher Link]

[35] Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana, “Cybercrime Detection in Online Communications: The Experimental Case of Cyberbullying Detection in the Twitter Network,” Computers in Human Behavior, vol. 63, pp. 433-443, 2016.
[
CrossRef] [Google Scholar] [Publisher Link]

[36] Michele Di Capua, Emanuel Di Nardo, and Alfredo Petrosino, “Unsupervised Cyber Bullying Detection in Social Networks,” 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, pp. 432-437, 2016.
[
CrossRef] [Google Scholar] [Publisher Link]

[37] Xiaowei Gu, “A Self-Training Hierarchical Prototype-based Approach for Semi-Supervised Classification,” Information Sciences, vol. 535, pp. 204-224, 2020.
[
CrossRef] [Google Scholar] [Publisher Link]

[38] Vinita Nahar et al., “Semi-Supervised Learning for Cyberbullying Detection in Social Networks,” Databases Theory and Applications: 25th Australasian Database Conference, Brisbane, QLD, Australia, pp. 160-171, 2014.
[
CrossRef] [Google Scholar] [Publisher Link]  

[39] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436-444, 2015.
[
CrossRef] [Google Scholar] [Publisher Link]

[40] Shervin Minaee et al., “Deep Learning--based Text Classification: A Comprehensive Review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1-40, 2021.
[
CrossRef] [Google Scholar] [Publisher Link]

[41] Celestine Iwendi et al., “Cyberbullying Detection Solutions based on Deep Learning Architectures,” Multimedia Systems, vol. 29, no. 3, pp. 1839-1852, 2020.
[
CrossRef] [Google Scholar] [Publisher Link]

[42] K.G. Apoorva, and D. Uma, “Detection of Cyberbullying Using Machine Learning and Deep Learning Algorithms,” 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Ravet, India, pp. 1-7, 2022.
[
CrossRef] [Google Scholar] [Publisher Link]  

[43] Jalal Omer Atoum, “Cyberbullying Detection Neural Networks using Sentiment Analysis,” 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, pp. 158-164, 2021.
[
CrossRef] [Google Scholar] [Publisher Link]  

[44] Roman Egger, and Enes Gokce, Natural Language Processing (NLP): An Introduction: Making Sense of Textual Data, Applied Data Science in Tourism, Springer, Cham, pp. 307-334, 2022.
[
CrossRef] [Google Scholar] [Publisher Link]  

[45] K.R. Chowdhary, Natural Language Processing, Fundamentals of Artificial Intelligence, Springer, New Delhi, pp. 603-649, 2020.
[
CrossRef] [Google Scholar] [Publisher Link

[46] Dipanjan Sarkar, Text Analytics with Python, A Practitioner's Guide to Natural Language Processing, Apress Berkeley, CA, 2019.
[
CrossRef] [Google Scholar] [Publisher Link

[47] Elizabeth D. Liddy, Natural Language Processing, 2nd Ed., Encyclopedia of Library and Information Science, NY, Marcel Decker, Inc, 2001.
[
Google Scholar] [Publisher Link

[48] Yue Kang et al., “Natural Language Processing (NLP) in Management Research: A Literature Review,” Journal of Management Analytics, vol. 7, no. 2, pp. 139-172, 2020.
[
CrossRef] [Google Scholar] [Publisher Link

[49] Muhammad Abdul-Mageed, and Mona Diab, “AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis,” Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, vol. 515, pp. 3907-3914, 2012.
[
Google Scholar] [Publisher Link

[50] Hossam S. Ibrahim, Sherif M. Abdou, and Mervat Gheith, “Sentiment Analysis for Modern Standard Arabic and Colloquial,” arXiv Preprint, pp. 95-109, 2015.
[
CrossRef] [Google Scholar] [Publisher Link

[51] Kenneth R. Beesley, “Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001,” ACL Workshop on Arabic Language Processing: Status and Perspective, vol. 1, pp. 1-8, 2001.
[
Google Scholar]

[52] Tim Buckwalter, “Issues in Arabic Orthography and Morphology Analysis,” Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Geneva, Switzerland, pp. 31-34, 2004.
[
Google Scholar] [Publisher Link]

[53] Mohamed Elmahdy et al., “Survey on Common Arabic Language Forms from a Speech Recognition Point of View,” Proceeding of International Conference on Acoustics (NAG-DAGA), Rotterdam, pp. 63-66, 2009.
[
Google Scholar] [Publisher Link

[54] Kareem Darwish et al., “A Panoramic Survey of Natural Language Processing in the Arab World,” Communications of the ACM, vol. 64, no. 4, pp. 72-81, 2021.
[
CrossRef] [Google Scholar] [Publisher Link

[55] Mohamed Abd Elaziz et al., Recent Advances in NLP: The Case of Arabic Language, Springer Cham, 2019.
[
CrossRef] [Google Scholar] [Publisher Link

[56] Batoul Haidar, Maroun Chamoun, and Ahmed Serhrouchni, “Multilingual Cyberbullying Detection System: Detecting Cyberbullying in Arabic Content,” 2017 1st cyber security in networking conference (CSNet), Rio de Janeiro, Brazil, pp. 1-8, 2017.
[
CrossRef] [Google Scholar] [Publisher Link

[57] Azalden Alakrot, Liam Murray, and Nikola S. Nikolov, “Towards Accurate Detection of Offensive Language in Online Communication in Arabic,” Procedia Computer Science, vol. 142, pp. 315-320, 2018.
[
CrossRef] [Google Scholar] [Publisher Link

[58] Dhiaa Musleh et al., “A Machine Learning Approach to Cyberbullying Detection in Arabic Tweets,” Computers, Materials & Continua, vol. 80, no. 1, pp. 1033-1054, 2024.
[
CrossRef] [Google Scholar] [Publisher Link

[59] Bandeh Ali Talpur, and Declan O’Sullivan, “Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter,” Informatics, vol. 7, no. 4, pp. 1-22, 2020.
[
CrossRef] [Google Scholar] [Publisher Link

[60] M. Rahman, S. Nur, M. T. Ahmed, D. Das, and A. T. Islam, “A Feature Engineering Approach for Detecting Cyberbullying in Bangla Text using Machine Learning,” 2022 International Conference on Recent Progresses in Science, Engineering and Technology (ICRPSET), Rajshahi, Bangladesh, pp. 1-5, 2022.
[
CrossRef] [Google Scholar] [Publisher Link]  

[61] Jheng-Long Wu, and Chiao-Yu Tang, “Classifying The Severity of Cyberbullying Incidents by using A Hierarchical Squashing-Attention Network,” Applied Sciences, vol. 12, no. 7, pp. 1-19, 2022.
[
CrossRef] [Google Scholar] [Publisher Link

[62] Madhura Vikram Vyawahare, and Sharvari Govilkar, “Severity Detection of Cyberbullying in Online Social Networks Using Machine Learning,” 2022 5th International Conference on Advances in Science and Technology (ICAST), Mumbai, India, pp. 1-6, 2022.
[
CrossRef] [Google Scholar] [Publisher Link

[63] Sylvia W. Azumah et al., “Cyberbullying in Text Content Detection: An Analytical Review,” International Journal of Computers and Applications, vol. 45, no. 9, pp. 579-586, 2023.
[
CrossRef] [Google Scholar] [Publisher Link

[64] Tanjim Mahmud et al., “Cyberbullying Detection for Low-Resource Languages and Dialects: Review of the State of the Art,” Information Processing & Management, vol. 60, no. 5, pp 1-52, 2023.
[
CrossRef] [Google Scholar] [Publisher Link

[65] Hooayda Allwaibed et al., “Cyberbullying Detection Approaches for Arabic Texts: Systematic Literature Review,” Frontiers in Artificial Intelligence, vol. 8, pp. 1-13, 2025.
[
CrossRef] [Google Scholar] [Publisher Link

[66] Bandeh Ali Talpur, and Declan O’Sullivan, “Cyberbullying Severity Detection: A Machine Learning Approach,” PloS One, vol. 15, no. 10, pp. 1-19, 2020.
[
CrossRef] [Google Scholar] [Publisher Link

[67] Jason Wei, and Kai Zou, “Eda: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 6382-6388, 2019. [CrossRef] [Google Scholar] [Publisher Link

[68] Anna Glazkova, “A Comparison of Synthetic Oversampling Methods for Multi-Class Text Classification,” arXiv Preprint, pp. 1-12, 2020.
[
CrossRef] [Google Scholar] [Publisher Link

[69] Ossama Obeid et al., “CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing,” Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, pp. 7022-7032, 2020.
[
Google Scholar] [Publisher Link]

[70] Edward Loper, and Steven Bird, “Nltk: The Natural Language Toolkit,” arXiv Preprint, pp. 1-8, 2002.
[
CrossRef] [Google Scholar] [Publisher Link

[71] Taha Zerrouki, “PyArabic: A Python Package for Arabic Text,” Journal of Open Source Software, vol. 8, no. 84, pp. 1-6, 2023.
[
CrossRef] [Google Scholar] [Publisher Link]

[72] Corinna Cortes, and Vladimir Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[
CrossRef] [Google Scholar] [Publisher Link]

[73] David M.W. Powers, “Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation,” arXiv Preprint, pp. 37-63, 2020.
[
CrossRef] [Google Scholar] [Publisher Link]