Novel Approach to Offensive Language Detection on Social Media: Tree CNN Integration with Adversarial Bi-LSTM

Novel Approach to Offensive Language Detection on Social Media: Tree CNN Integration with Adversarial Bi-LSTM

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-7
Year of Publication : 2024
Author : V. Uma Maheswari, R. Priya
DOI : 10.14445/22315381/IJETT-V72I7P120

How to Cite?

V. Uma Maheswari, R. Priya, "Novel Approach to Offensive Language Detection on Social Media: Tree CNN Integration with Adversarial Bi-LSTM," International Journal of Engineering Trends and Technology, vol. 72, no. 7, pp. 187-197, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I7P120

Abstract
Offensive language detection on social media platforms is a crucial task for maintaining a healthy online environment and ensuring user safety. Traditional methods often struggle to effectively capture the nuances and dynamic nature of offensive language in such diverse and rapidly evolving contexts. Our research presents a novel approach to offensive language detection on social media by integrating Tree Convolutional Neural Networks (CNN) with Adversarial Bidirectional Long Short-Term Memory (Bi-LSTM) networks. We address the challenges of imbalanced data and semantic understanding by employing the Synthetic Minority Over-sampling Technique (SMOTE) and Word2Vec for feature extraction. To enhance model interpretability and focus on relevant features, we incorporate attention mechanisms within both the Tree CNN and the Adversarial Bi-LSTM. We utilize two attention mechanisms: one for identifying repetitive patterns using an Entropy Pruning Method and another for error loss monitoring during training. This dual attention mechanism enables our model to effectively distinguish offensive from non-offensive text while also providing insights into model decisions. Experimental findings on benchmarking social media datasets show that our suggested methodology outperforms cutting-edge approaches in terms of accuracy and interpretation. Our study contributes to advancing offensive language detection techniques on social media platforms and provides a framework for developing more reliable and interpretable models for online content moderation.

Keywords
Offensive language detection, Social media, Tree CNN, Adversarial Bi-LSTM, Imbalanced data handling, SMOTE, Word2Vec, Attention mechanisms.

References
[1] Sneha Chinivar et al., “Online Offensive Behaviour in Socialmedia: Detection Approaches, Comprehensive Review and Future Directions,” Entertainment Computing, vol. 45, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] José María Molero et al., “Offensive Language Detection in Spanish Social Media: Testing From Bag-of-Words to Transformers Models,” IEEE Access, vol. 11, pp. 95639-95652, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Anas Ali Khan et al., “Offensive Language Detection for Low Resource Language Using Deep Sequence Model,” IEEE Transactions on Computational Social Systems, pp. 1-9, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Vildan Mercan et al., “Hate Speech and Offensive Language Detection from Social Media,” 2021 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube), Quetta, Pakistan, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Kiran Babu Nelatoori, and Hima Bindu Kommanti, “Attention-Based Bi-LSTM Network for Abusive Language Detection,” IETE Journal of Research, vol. 69, no. 11, pp. 7884-7892, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Arghasree Banerjee et al., “Synthetic Minority Oversampling in Addressing Imbalanced Sarcasm Detection in Social Media,” Multimedia Tools and Applications, vol. 79, no. 47, pp. 35995-36031, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Parisa Hajibabaee et al., “Offensive Language Detection on Social Media Based on Text Classification,” 2022 IEEE 12th Annual Computing and Communication Workshop and Conference, Las Vegas, NV, USA, pp. 0092-0098, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Rosni Lumbantoruan et al., “Analysis Comparison of FastText and Word2vec for Detecting Offensive Language,” 2022 IEEE International Conference of Computer Science and Information Technology (ICOSNIKOM), Laguboti, North Sumatra, Indonesia, pp. 1-8, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Amit Kumar Das et al., “Bangla Hate Speech Detection on Social Media Using Attention-Based Recurrent Neural Network,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 578-591, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Çağrı Çöltekin et al., “A Corpus of Turkish Offensive Language on Social Media,” Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, pp. 6174-6184, 2020.
[Google Scholar] [Publisher Link]
[11] M. Anand et al., “Deep Learning and Natural Language Processing in Computation for Offensive Language Detection in Online Social Networks by Feature Selection and Ensemble Classification Techniques,” Theoretical Computer Science, vol. 943, pp. 203-218, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Mohsan Ali et al., “Social Media Content Classification and Community Detection Using Deep Learning and Graph Analytics,” Technological Forecasting and Social Change, vol. 188, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Gulnur Kazbekova et al., “Offensive Language Detection on Online Social Networks Using Hybrid Deep Learning Architecture,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 11, pp. 793-805, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Sunil Saumya, Abhinav Kumar, and Jyoti Prakash Singh, “Filtering Offensive Language from Multilingual Social Media Contents: A Deep Learning Approach,” Engineering Applications of Artificial Intelligence, vol. 133, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Ahmed Cherif Mazari, and Hamza Kheddar, “Deep Learning-Based Analysis of Algerian Dialect Dataset Targeted Hate Speech, Offensive Language and Cyberbullying,” International Journal of Computing and Digital Systems, vol. 13, no, 1, pp. 965-972, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Tweets Dataset, Kaggle. [Online]. Available: https://www.kaggle.com/datasets/mmmarchetti/tweets-dataset
[17] Bencheng Wei et al., “Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning,” arXiv, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Fatemah Husain et al., “SalamNET at SemEval-2020 Task12: Deep Learning Approach for Arabic Offensive Language Detection,” arXiv, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Yogesh Yadav et al., “A Comparative Study of Deep Learning Methods for Hate Speech and Offensive Language Detection in Textual Data,” 2021 IEEE 18th India Council International Conference, Guwahati, India, pp. 1-6, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Sunil Saumya, Abhinav Kumar, and Jyoti Prakash Singh, “Offensive Language Identification in Dravidian Code Mixed Social Media Text,” Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Kyiv, pp. 36-45, 2021.
[Google Scholar] [Publisher Link]
[21] Hanane Mohaouchane, Asmaa Mourhir, and Nikola S. Nikolov, “Detecting Offensive Language on Arabic Social Media Using Deep Learning,” 2019 Sixth International Conference on Social Networks Analysis, Management and Security, Granada, Spain, pp. 466-471, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Georgios K. Pitsilis, Heri Ramampiaro, and Helge Langseth, “Detecting Offensive Language in Tweets Using Deep Learning,” arXiv, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Sreekanth Madisetty, and Maunendra Sankar Desarkar, “Aggression Detection in Social Media Using Deep Neural Networks,” Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), Santa Fe, New Mexico, USA, pp. 120-127, 2018.
[Google Scholar] [Publisher Link]
[24] Marcos Zampieri et al., “SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval),” arXiv, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Vinay Singh et al., “Aggression Detection on Social Media Text Using Deep Neural Networks,” Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, Belgium, pp. 43-50, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[26] The Reddit Dataset Dataset, kaggle. [Online]. Available: https://www.kaggle.com/datasets/pavellexyr/the-reddit-dataset-dataset
[27] Facebook Data, kaggle. [Online]. Available: https://www.kaggle.com/datasets/sheenabatra/facebook-data
[28] Sentiment140 Dataset with 1.6 Million Tweets, Kaggle. [Online]. Available: https://www.kaggle.com/datasets/kazanova/sentiment140