SQLI Detection Based on LDA Topic Model

SQLI Detection Based on LDA Topic Model

© 2021 by IJETT Journal
Volume-69 Issue-11
Year of Publication : 2021
Authors : Nilesh Yadav, Dr. Narendra Shekokar
DOI :  10.14445/22315381/IJETT-V69I11P206

How to Cite?

Nilesh Yadav, Dr. Narendra Shekokar, "SQLI Detection Based on LDA Topic Model," International Journal of Engineering Trends and Technology, vol. 69, no. 11, pp. 47-52, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I11P206

Structured Query Language Injection (SQLI) is the topmost dangerous web application vulnerability in all web attacks, and this causes serious harm to the entire web system. Due to the heterogeneous nature of this attack, its detection remains a challenging problem. Researchers started using the Machine Learning (ML) based approach to mitigate this attack, but ML-based techniques heavily depend on the accuracy of feature extraction. To get more useful reduced features and improve accuracy, consider the semantic consistency and proper probability distribution of the words. The proper reduced dimensions improve the text classification process. Therefore, this paper uses a topic modeling-based Latent Dirichlet Allocation concept as a dimensionality reduction technique to acquire informative features. It helps to grab the more useful features by considering the semantic cooccurrence between the observed words from logs. This topic-modeling concept can act here as an efficient feature reduction technique and extracts the more valuable features from the most dangerous vulnerability logs. The paper explores the efficient detection of SQLI. The ECML/PKDD-2007 HTTP traffic logs experiments used supervised ML techniques and evaluated the results using accuracy matrix, performance time, and ROC curve.

Attack, SQLI, Latent Dirichlet Allocation, Dimension Reduction, ECML.

[1] OWASP Group., Top 10 Most Critical Web Application Security Vulnerabilities, (2021). [online]. Available: https// www.owasp.org/ index. php.
[2] X. Pan and H. Assal, Providing context for free text interpretation International Conference on Natural Language Processing and Knowledge Engineering, Proceedings. 2003, Beijing, China, (2003) 704-709 .
[3] Sebastiani, F.: Classification of text, automatic. In Brown, K., ed.: The Encyclopedia of Language and Linguistics, Volume 14, 2nd Edition. Elsevier Science Publishers, Amsterdam, (2006) 457–462.
[4] Knowledge Discovery in Databases: ECML/PKDD 200, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Poland, (2007) 17-21.
[5] Gallagher, B., and Eliassi-Rad T., Classification of http attacks: a study on the ECML/PKDD discovery challenge, Technical Report No. LLNL-TR-414570. Lawrence Livermore National Laboratory, Livermore, CA, (2007) (2009).
[6] N. Yadav, Dr. N. Shekokar, Preprocessing HTTP Requests and Dimension Reduction Technique for SQLI Detection, Lecture Notes in Networks and Systems, Conference Proceedings of ICDLAIR2019, MNIT, Jaipur, India. Springer, (2021) 190-200.
[7] D. Buenaño-Fernandez, M. Gonzalez, D. Gil, and S. Luján-Mora, Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach in IEEE Access, 8 (2020) 35318-35330,.
[8] N. Sethasathien and P. Prasertsom, Research Topic Modeling: A Use Case for Data Analytics on Research Project Data, 1st International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand, (2020) 1-6.
[9] L. Xia, D. Luo, C. Zhang, and Z. Wu, A Survey of Topic Models in Text Classification, 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, (2019) 244-250 .
[10] A. Terko, E. Žuni?, and D. Ðonko, NeurIPS Conference Papers Classification Based on Topic Modeling, International Conference on Information, Communication and Automation Technologies (ICAT), Sarajevo, Bosnia and Herzegovina, (2019) 1-5.
[11] W. Sun, X. Ran, X. Luo, and C. Wang, An Efficient Framework by Topic Model for Multi-label Text Classification, International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, ( 2019) 1-7.
[12] I. Deliu, C. Leichter and K. Franke, Collecting Cyber Threat Intelligence from Hacker Forums via a Two-Stage, Hybrid Process using Support Vector Machines and Latent Dirichlet Allocation, WA, USA, ( 2018) 5008-5013.
[13] Z. A. Guven, B. Diri, and T. Cakaloglu, Classification of New Titles by Two-Stage Latent Dirichlet Allocation 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), Adana, ( 2018) 1-5.
[14] E. S. Usop, R. R. Isnanto, and R. Kusumaningrum, Part of speech features for sentiment classification based on Latent Dirichlet Allocation 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, (2017) 31-34.
[15] C. Hsu and C. Chiu, A hybrid Latent Dirichlet Allocation approach for topic classification IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), Gdynia, ( 2017) 312-315.
[16] Q. Chen, L. Yao and J. Yang, Short text classification based on LDA topic model International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, (2016) 749-753.
[17] Y. Chen and S. Li, Using latent Dirichlet allocation to improve the text classification performance of support vector machine IEEE Congress on Evolutionary Computation (CEC), Vancouver, (2016) 1280-1286.
[18] Z. Li, W. Shang, and M. Yan, News text classification model based on a topic model, IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, (2016) 1-5.
[19] Inkpen, Diana & Razavi, Amir, Topic Classification using Latent Dirichlet Allocation at Multiple Levels 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2014), Nepal, (2014).
[20] D. M. Blei, A.Y. Ng, & M. I. Jordan, Journal of Machine Learning Res. 3 (2003) 993–1022.
[21] S. Thiyagarajan High user Experience by Providing Relevant News Articles using Topic Modelling, IJETT-International Journal of Engineering Trends and Technology (IJETT) – Volume 55, Number 1, January 2018. ISSN: 2231-5381.
[22] G. P. Paul, Sricharukesh, e.g., S. Vigneshkumar, G. Kannan, Advanced Scalable Algorithm for Community Question Answering Using Post Voting Prediction, IJETT International Journal of Computer Science and Engineering (IJETT-IJCSE) – Special Issue ICETSST – (2018) . ISSN: 2348 – 8387
[23] nptel.ac.in., Natural Language Processing Lec. 42, (2003) [online]. Available:https://nptel.ac.in/courses/106/105/106 105158/, last accessed 2021/1/15.
[24] Thomas L Griffiths and Mark Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences, 101(l 1), (2004) 5228–5235.
[25] Stuart Geman and Donald Geman, Stochastic relaxation, Gibbs distributions, and the bayesian restoration of images, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 6 (1984) 721–741.
[26] Scikit-learn.org scikit-learn: Machine Learning in Python, (2021). [online]. Available: https://scikit-learn.Org/stable/ index.html, last accessed 2021/4/27.
[27] Anaconda Distribution (2021), [online]. Available: https://www.anaconda.com/distribution.