Feature Weight Based Fuzzy C-Means Clustering with Optimal Initialization for Software Fault Prediction

Yuvaraj K; Balaji N V

doi:https://doi.org/10.14445/22315381/IJETT-V73I7P122

Research Article | Open Access | Download PDF

Volume 73 | Issue 7 | Year 2025 | Article Id. IJETT-V73I7P122 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I7P122

Feature Weight Based Fuzzy C-Means Clustering with Optimal Initialization for Software Fault Prediction

Yuvaraj K, Balaji N V

Received	Revised	Accepted	Published
12 Mar 2025	11 Jun 2025	30 Jun 2025	30 Jul 2025

Citation :

Yuvaraj K, Balaji N V, "Feature Weight Based Fuzzy C-Means Clustering with Optimal Initialization for Software Fault Prediction," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 7, pp. 280-292, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I7P122

Abstract

In this digital era, software is ruling the world by making the life of humans easier and more convenient in many ways. Not only in business, but software is also required for each specific field. Software development has become a predominant and common field that provides services to every other field of science and engineering. However, the primary challenge in developing software is to identify and fix the faults that occur in various circumstances as early as possible to minimize the time, effort, and associated inconvenience. This paper proposes an effective software fault prediction framework to identify the fault modules in software projects. The model applies accelerated k-means clustering for identifying the count of clusters by evaluating gap statistics. Then, fuzzy clustering is applied over the training set, which makes use of a probability distribution for initializing cluster centroids and feature weights to compute the similarity between the samples and the cluster centroids. As a result, samples inside the cluster are strengthened and samples outside the cluster are weakened. Moreover, it also helps to increase the quality of the clusters and accelerates the convergence of the clustering process by reducing the iterations. Using the classification model, the modules are categorized as non-defective or defective based on their high population in the relevant cluster. The effectiveness of the proposed model has been tested experimentally, and the findings show that the framework can successfully identify defective software modules in less time and with higher accuracy.

Keywords

Accelerated k-means, Feature weight, Fuzzy c-means clustering, Probability distribution, Software fault prediction.

References

[1] Norah Abdullah Al-Johany et al., “Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs,” Computers, Materials & Continua, vol. 79, no. 2, pp. 3139-3173, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Shahzad Ashiq et al., “Challenges and Barriers to Software Testing,” Bulletin of Business and Economics, vol. 13, no. 1, pp. 628-640, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Marek Molęda et al., “From Corrective to Predictive Maintenance - A Review of Maintenance Approaches for the Power Industry,” Sensors, vol. 23, no. 13, pp. 1-47, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Jim Johnson, CHAOS 2020: Beyond Infinity, The Standish Group, Boston, MA, 2021.
[Google Scholar] [Publisher Link]
[5] Golnoosh Abaei, and Ali Selamat, “Software Fault Prediction Based on Improved Fuzzy Clustering,” International Conference on Distributed Computing and Artificial Intelligence, pp. 165-172, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Lov Kumar, Sanjay Misra, and Santanu Ku. Rath, “An Empirical Analysis of the Effectiveness of Software Metrics and Fault Prediction Model for Identifying Faulty Classes,” Computer Standards & Interfaces, vol. 53, pp. 1-32, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Aswathy Rajendra Kurup et al., “Ensemble Models for Circuit Topology Estimation, Fault Detection and Classification in Distribution Systems,” Sustainable Energy, Grids and Networks, vol. 34, pp. 1-32, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Santosh S. Rathore, and Sandeep Kumar, “A Study on Software Fault Prediction Techniques,” Artificial Intelligence Review, vol. 51, no. 2, pp. 255-327, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Prachi Pramod Shedge et al., “Enhancing Maternal Health: A Soft Computing Approach to Pregnancy Risk Management,” Modernizing Maternal Care with Digital Technologies, pp. 65-96, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Sarita Negi, Devesh Pratap Singh, and Man Mohan Singh Rauthan, “A Systematic Literature Review on Soft Computing Techniques in Cloud Load Balancing Network,” International Journal of System Assurance Engineering and Management, vol. 15, no. 3, pp. 800-838, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Tripathi, “Machine Learning Based Methods for Software Fault Prediction: A Survey,” Expert Systems with Applications, vol. 172, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen, “A Systematic and Comprehensive Investigation of Methods to Build and Evaluate Fault Prediction Models,” Journal of Systems and Software, vol. 83, no. 1, pp. 2-17, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Haonan Tong, Bin Liu, and Shihai Wang, “Software Defect Prediction using Stacked Denoising Autoencoders and Two-Stage Ensemble Learning,” Information and Software Technology, vol. 96, pp. 94-111, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Triphathi, “Software Bug Prediction Prototype using Bayesian Network Classifier: A Comprehensive Model,” Procedia Computer Science, vol. 132, pp. 1412-1421, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Aleksey Borodulin et al., “Using Machine Learning Algorithms to Solve Data Classification Problems using Multi-Attribute Dataset,” BIO Web of Conferences, vol. 84, pp. 1-11, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Asad Ali, and Carmine Gravino, “Bio-inspired Algorithms in Software Fault Prediction: A Systematic Literature Review,” 2020 14th International Conference on Open Source Systems and Technologies (ICOSST), Lahore, Pakistan, pp. 1-8, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Santosh S. Rathore, and Sandeep Kumar, “An Empirical Study of Ensemble Techniques for Software Fault Prediction,” Applied Intelligence, vol. 51, no. 6, pp. 3615-3644, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Jian Li et al., “Software Defect Prediction via Convolutional Neural Network,” 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, Czech Republic, pp. 318-328, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Chetan Shelke et al., Optimized Machine Learning Techniques for Software Fault Prediction, Natural Language Processing for Software Engineering, pp. 207-219, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Nikhil Saji Thomas, and S. Kaliraj, “An Improved and Optimized Random Forest Based Approach to Predict the Software Faults,” SN Computer Science, vol. 5, no. 5, pp. 1-18, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Misbah Ali et al., “Enhancing Software Defect Prediction: A Framework with Improved Feature Selection and Ensemble Machine Learning,” PeerJ Computer Science, vol. 10, pp. 1-37, 2024.
[Google Scholar] [Publisher Link]
[22] Oral Alan, and Cagatay Catal, “Thresholds Based Outlier Detection Approach for Mining Class Outliers: An Empirical Case Study on Software Measurement Datasets,” Expert Systems with Applications, vol. 38, no. 4, pp. 3440-3445, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[23] T. Pushpavathi, V. Suma, and V. Ramaswamy, “Analysis of Software Fault and Defect Prediction by Fuzzy C-Means Clustering and Adaptive Neuro Fuzzy C-Means Clustering,” International Journal of Scientific & Engineering Research, vol. 5, no. 9, 2014.
[Google Scholar] [Publisher Link]
[24] Golnoush Abaei, and Ali Selamat, Increasing the Accuracy of Software Fault Prediction using Majority Ranking Fuzzy Clustering, Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 179-193, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Riski Annisa, Didi Rosiyadi, and Dwiza Riana, “Improved Point Center Algorithm for K-Means Clustering to Increase Software Defect Prediction,” International Journal of Advances in Intelligent Informatics, vol. 6, no. 3, pp. 328-339, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[26] D. Shrivathsan et al., “Novel Fuzzy Clustering Methods for Test Case Prioritization in Software Projects,” Symmetry, vol. 11, no. 11, pp. 1-22, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Ahmed Iqbal et al., “Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 1-19, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Wasiur Rhmann et al., “Software Fault Prediction Based on Change Metrics Using Hybrid Algorithms: An Empirical Study,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 4, pp. 419-424, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Garvit Rajesh Choudhary et al., “Empirical Analysis of Change Metrics for Software Fault Prediction,” Computers & Electrical Engineering, vol. 67, pp. 15-24, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[30] José M. Pena, Jose Antonio Lozano, and Pedro Larranaga, “An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm,” Pattern Recognition Letters, vol. 20, no. 10, pp. 1027-1040, 1999.
[CrossRef] [Google Scholar] [Publisher Link]
[31] David Arthur, and Sergei Vassilvitskii, “k-Means++: the Advantages of Careful Seeding,” Technical Report, Stanford, pp. 1027-1035, 2006.
[Google Scholar] [Publisher Link]
[32] C. Elkan, “Using the Triangle Inequality to Accelerate k-Means,” Proceedings of the 20th International Conference on Machine Learning, Washington, DC, pp. 147-153, 2003.
[Google Scholar] [Publisher Link]
[33] Greg Hamerly, “Making k-means Even Faster,” Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 130-140, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Enrique H. Ruspini, “New Experimental Results in Fuzzy Clustering,” Information Sciences, vol. 6, pp. 273-284, 1973.
[CrossRef] [Google Scholar] [Publisher Link]
[35] James C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, 1st ed., Advanced Applications in Pattern Recognition, Springer New York, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[36] R.E. Hammah, and J.H. Curran, “Fuzzy Cluster Algorithm for the Automatic Identification of Joint Sets,” International Journal of Rock Mechanics and Mining Sciences, vol. 35, no. 7, pp. 889-905, 1998.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Nikhil R. Pal, and C. James, “On Cluster Validity for the Fuzzy C-Means Model,” IEEE Transactions on Fuzzy Systems, vol. 3, no. 3, pp. 370-379, 1995.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Min Ren et al., “A Self-Adaptive Fuzzy C-Means Algorithm for Determining the Optimal Number of Clusters,” Computational Intelligence and Neuroscience, vol. 2016, no. 1, pp. 1-12, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Adrian Stetco, Xiao-Jun Zeng, and John Keane, “Fuzzy C-Means++: Fuzzy C-Means with Effective Seeding Initialization,” Expert Systems with Applications, vol. 42, no. 21, pp. 7541-7548, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Andy Arief Setyawan, and Ahmad Ilham, “A Novel Framework of the Fuzzy C-Means Distances Problem Based Weighted Distance,” Journal of Applied Computing and Informatics, pp. 1-25, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Shuisheng Zhou et al., “A New Membership Scaling Fuzzy C-Means Clustering Algorithm,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 9, pp. 2810-2818, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Robert Tibshirani, Guenther Walther, and Trevor Hastie, “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411-423, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Xizhao Wang, Yadong Wang, and Lijuan Wang, “Improving Fuzzy C-Means Clustering Based on Feature-Weight Learning,” Pattern Recognition Letters, vol. 25, no. 10, pp. 1123-1132, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Martin Shepperd et al., “Data Quality: Some Comments on the NASA Software Defect Datasets,” IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208-1215, 2013.
[CrossRef] [Google Scholar] [Publisher Link]