Finding Recursive Generics in Java Source Code using Machine Learning

Finding Recursive Generics in Java Source Code using Machine Learning

  IJETT-book-cover           
  
© 2023 by IJETT Journal
Volume-71 Issue-8
Year of Publication : 2023
Author : Neha Kumari, Rajeev Kumar
DOI : 10.14445/22315381/IJETT-V71I8P207

How to Cite?

Neha Kumari, Rajeev Kumar, "Finding Recursive Generics in Java Source Code using Machine Learning," International Journal of Engineering Trends and Technology, vol. 71, no. 8, pp. 76-84, 2023. Crossref, https://doi.org/10.14445/22315381/IJETT-V71I8P207

Abstract
Understanding a complex type structure and its use in a type-safe manner is a difficult task. The recursive generic type is a complex variant one can expect by finding the recursion. It has major significance in generic programming for solving binary method problems and mimicking self-type. However, improper use of recursive generics can cause vulnerabilities in source code. To avoid unsafe practices, a programmer must be aware of the recursive generic presence in source code. In Java generics, the type recursion can be found at a class or interface declaration. Therefore, it is appropriate to distinguish class type at declaration time itself. In this paper, we use a machine learning approach to find recursive and non-recursive generic types in Java source code. We collect data from ten contemporary Java projects and prepare a dataset with generic-specific attributes. The lesser presence of recursive generic type in Java projects causes an imbalanced dataset. Initially, the dataset results were highly imbalanced. Therefore, we resampled the dataset and used the dataset to train decision tree-based classifiers. Using standard performance metrics, we conduct a comparative analysis to find a (near-) optimal classifier among the six decision tree-based classifiers. Our analysis reasserts that the ensemble-based ``Random Forest Classifier" results best in all nine metrics.

Keywords
Classification, Decision Tree, F-bounded, Java Generics, Type-Safe.

References
[1] Luca Cardelli, and Peter Wegner, “On Understanding Types, Data Abstraction and Polymorphism,” ACM Computing Surveys, vol. 17, no. 4, pp. 471–523, 1985.
[CrossRef] [Google Scholar] [Publisher Link]
[2] María Lucía Barrón–Estrada, and Ryan Stansifer, “Inheritance, Generics and Binary Methods in Java,” Computing and Systems, vol. 7, no. 2, pp. 113–122, 2003.
[Google Scholar] [Publisher Link]
[3] Peter Canning et al., “F-Bounded Polymorphism for Object-Oriented Programming,” Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture, pp. 273–280, 1989.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ben Greenman, Fabian Muehlboeck, and Ross Tate, “Getting F-Bounded Polymorphism Into Shape,” ACM SIGPLAN Notices, vol. 49, no. 6, pp. 89–99, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Daniel Smith, and Robert Cartwright, “Java Type Inference is Broken: Can We Fix It?,” Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pp. 505–524, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Ross Tate, Alan Leung, and Sorin Lucian Lerner, “Taming Wildcards in Java’s Type System,” Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 614–627, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Stefan Wehr, Ralf Lämmel, and Peter Thiemann, “JavaGI: Generalized Interfaces for Java,” European Conference on Object-Oriented Programming, pp. 347–372, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Monalisa Jena, and Satchidananda Dehuri, “Decision Tree for Classification and Regression: A State-of-the-Art Review,” Informatica, vol. 44, no. 4, pp. 405–420, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Andrew J Kennedy, and Benjamin C Pierce, “On Decidability of Nominal Subtyping with Variance,” Proceeding FOOL/WOOD ACM, 2007.
[Google Scholar] [Publisher Link]
[10] P. Wadler, and S. Blott, “How to Make Ad-Hoc Polymorphism Less Ad Hoc,” Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 60–76, 1989.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Tumula Mani Harsha et al., “Survey on Resume Screening Mechanisms,” SSRG International Journal of Computer Science and Engineering, vol. 9, no. 4, pp. 14-22, 2022.
[CrossRef] [Publisher Link]
[12] Fitzroy Nembhard, Marco Carvalho, and Thomas Eskridge, “Extracting Knowledge from Open Source Projects to Improve Program Security,” Proceeding SoutheastCon, IEEE, pp. 1–7, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Timothy Chappelly et al., “Machine Learning for Finding Bugs: An Initial Report,” Proceeding IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation, pp. 21–26, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Nevena Rankovic et al., “Influence of Input Values on the Prediction Model Error using Artificial Neural Network Based on Taguchi’s Orthogonal Array,” Concurrency and Computation: Practice and Experience, vol. 34, no. 20, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Neha Kumari, and Rajeev Kumar, “Profiling JVM for AI Applications using Deep Learning Libraries,” Machine Learning for Predictive Analysis Springer Singapore, pp. 395–404, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Miltiadis Allamanis, and Charles Sutton, “Mining Source Code Repositories at Massive Scale using Language Modeling,” Proceeding 10th Working Conference on Mining Software Repositories, IEEE, pp. 207–216, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Rudolf Ferenc et al., “A Public Unified Bug Dataset for Java and its Assessment Regarding Metrics and Bug Prediction,” Software Quality Journal, vol. 28, pp. 1447–1506, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Alexander LeClair, and Collin McMillan, “Recommendations for Datasets for Source Code Summarization,” arXiv Preprint, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Miltiadis Allamanis et al., “Suggesting Accurate Method and Class Names,” Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pp. 38–49, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Uri Alon et al., “A General Path-Based Representation for Predicting Program Properties,” Proceeding ACM SIGPLAN Notices, pp. 404–419, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Nayak Suvra et al., “Comparative Analysis of Har Datasets Using Classification Algorithms,” Computer Science and Information Systems, vol. 19, no. 1, pp. 47–63, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Sumitra Nuanmeesri, Wongkot Sriurai, and Nattanon Lamsamut, “Stroke Patients Classification using Resampling Techniques and Decision Tree Learning,” International Journal of Engineering Trends and Technology, vol. 69, no. 6, pp. 115–120, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Eibe Frank et al., “Weka-A Machine Learning Workbench for Data Mining,” Data Mining and Knowledge Discovery Handbook Springer, pp. 1269–1277, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Ritu Ratra, and Preeti Gulia, “Experimental Evaluation of Open-Source Data Mining Tools (Weka and Orange),” International Journal of Engineering Trends and Technology, vol. 68, no. 8, pp. 30–35, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Jens Dietrich et al., “Xcorpus–An Executable Corpus of Java Programs,” Journal of Object Technology, vol. 16, no. 4, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Cristina V. Lopes et al., “Dejavu: A Map of Code Duplicates on GitHub,” Proceeding ACM Programming Languages, vol. 1, no. 84, pp. 1-28, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[27] G. Anil Kumar, “Research Methodology on Code Clone Detection with Refactoring using Textual and Metrics Analysis in Software,” SSRG International Journal of Computer Science and Engineering, vol. 2, no. 12, pp. 19-23, 2015.
[CrossRef] [Publisher Link]
[28] Robin Milner, “A Theory of Type Polymorphism in Programming,” Journal of Computer and System Sciences, vol. 17, no. 3, pp. 348-375, 1978. [CrossRef]
[CrossRef] [Google Scholar] [Publisher Link]
[29] Mark Day et al., “Subtypes vs. Where Clauses: Constraining Parametric Polymorphism,” Proceedings of the Tenth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 156-168, 1995.
[CrossRef] [Google Scholar] [Publisher Link]
[30] John Altidor, Shanshan Huang, and Yannis Smaragdakis, “Taming the Wildcards: Combining Definition-and Use-Site Variance,” ACM SIGPLAN Notices, vol. 46, no. 6, pp. 602-613, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Luis Mastrangelo, Matthias Hauswirth, and Nathaniel Nystrom, “Casting About in the Dark: An Empirical Study of Cast Operations in Java Programs,” Proceedings of the ACM on Programming Languages, pp. 1-31, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Kim B. Bruce, “Some Challenging Typing Issues in Object-Oriented Languages,” Electronic Notes in Theoretical Computer Science, vol. 82, no. 8, pp. 1-29, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Erik Ernst, “Family Polymorphism,” European Conference on Object-Oriented Programming, Springer Berlin Heidelberg, pp. 303-326, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Francisco Ortin, Guillermo Facundo, and Miguel Garcia, “Analyzing Syntactic Constructs of Java Programs with Machine Learning,” Expert Systems with Applications, vol. 215, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Ewan Tempero et al., “The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies,” Proceeding 17th Asia Pacific Software Engineering Conference, IEEE, pp. 336–345, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Imad Eddine Araar, and Hassina Seridi, “Software Features Extraction from Object-Oriented Source Code using an Overlapping Clustering Approach,” Informatica, vol. 40, no. 2, 2016.
[Google Scholar] [Publisher Link]