Dominant Features Selection with Clustering Genetic Model to Improve the Access Time of Data in Big Data Management Using Distributed Machine Learning

Peerzada Hamid Ahmad; Munishwar Rai

doi:https://doi.org/10.14445/22315381/IJETT-V73I3P128

Research Article | Open Access | Download PDF

Volume 73 | Issue 3 | Year 2025 | Article Id. IJETT-V73I3P128 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I3P128

Dominant Features Selection with Clustering Genetic Model to Improve the Access Time of Data in Big Data Management Using Distributed Machine Learning

Peerzada Hamid Ahmad, Munishwar Rai

Received	Revised	Accepted	Published
09 Sep 2024	11 Jan 2025	27 Jan 2025	28 Mar 2025

Citation :

Peerzada Hamid Ahmad, Munishwar Rai, "Dominant Features Selection with Clustering Genetic Model to Improve the Access Time of Data in Big Data Management Using Distributed Machine Learning," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 3, pp. 403-422, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I3P128

Abstract

The explosive nature of big data has created serious challenges for information managers, especially in providing fast availability and response times. Conventional data management systems tend to falter when dealing with enormous datasets, which causes latency that can slow down real-time analysis and decision-making. In response, this research introduces a new cluster-based genetic model aimed at hastening access to data in big data management systems. The method combines a genetic model with an emphasis on feature selection to maximize data retrieval speed. Through the use of distributed machine learning techniques, the model detects and ranks the most significant features, optimizing the clustering process to minimize access time and retrieval complexity. The genetic method reduces access time and increases clustering efficiency by focusing on prominent features. An evolutionary algorithm is used to optimize data storage and retrieval in such a way as to minimize retrieval times. The research tackles crucial issues like the requirement for high-speed data processing, data system scalability, and data structure complexity. The proposed model adapts dynamically to the changing data landscape, reducing latency and improving the overall efficiency of large-scale data systems. Results show that the cluster-based genetic model greatly enhances data access efficiency. It recorded a 35% decrease in access time when tested on large datasets compared to traditional data management methods. The median data retrieval time was reduced from 120 milliseconds to 78 milliseconds, showing the model's efficiency in optimizing data clustering and retrieval processes. This decrease in access time showcases the model's ability to optimize the efficiency of big data systems, especially in situations that involve quick and efficient data retrieval.

Keywords

Big Data Management, Cluster-Based Genetic Model, Data Access Time, Data Retrieval Efficiency, Distributed Machine Learning, Real-Time Processing, Scalability.

References

[1] Guilian Feng, “Feature Selection Algorithm Based on Optimized Genetic Algorithm and the Application in High-Dimensional Data Processing,” PLoS ONE, vol. 19, no. 5, pp. 1-24, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Imad Zeebaree, “The Distributed Machine Learning in Cloud Computing and Web Technology: A Review of Scalability and Efficiency,” Journal of Information Technology and Informatics, vol. 3, no. 1, 2024.
[Google Scholar]
[3] Rajesh Natarajan et al., “Utilizing a Machine Learning Algorithm to Choose a Significant Traffic Identification System,” International Journal of Information Management Data Insights, vol. 4, no. 1, pp. 1-13, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Sarina Aminizadeh et al., “Opportunities and Challenges of Artificial Intelligence and Distributed Systems to Improve the Quality of Healthcare Service,” Artificial Intelligence in Medicine, vol. 149, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Hudhaifa Mohammed Abdulwahab et al., “MOBCSA: Multi-Objective Binary Cuckoo Search Algorithm for Features Selection in Bioinformatics,” IEEE Access, vol. 12, pp. 21840-21867, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Dipti Theng, and Kishor K. Bhoyar, “Feature Selection Techniques for Machine Learning: A Survey of More than Two Decades of Research,” Knowledge and Information Systems, vol. 66, no. 3, pp. 1575-1637, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Xudong Sun et al., “Survey of Distributed Computing Frameworks for Supporting Big Data Analysis,” Big Data Mining and Analytics, vol. 6, no. 2, pp. 154-169, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Kimia Abedpour, Mirsaeid Hosseini Shirvani, and Elmira Abedpour, “A Genetic-Based Clustering Algorithm for Efficient Resource Allocating of IoT Applications in Layered Fog Heterogeneous Platforms,” Cluster Computing, vol. 27, no. 2, pp. 1313-1331, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Arezoo Ghasemi, and Amin Keshavarzi, “Energy-Efficient Virtual Machine Placement in Heterogeneous Cloud Data Centers: A Clustering-Enhanced Multi-Objective, Multi-Reward Reinforcement Learning Approach,” Cluster Computing, vol. 27, no. 10, pp. 14149-14166, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Mohammad Hassan Almaspoor et al., “Distributed Independent Vector Machine for Big Data Classification Problems,” The Journal of Supercomputing, vol. 80, no. 6, pp. 7207-7244, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Faheem Ullah et al., “Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds,” Journal of Network and Computer Applications, vol. 224, pp. 103837-103837, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Anayo Chukwu Ikegwu, Henry Friday Nweke, and Chioma Virginia Anikwe, “Recent Trends in Computational Intelligence for Educational Big Data Analysis,” Iran Journal of Computer Science, vol. 7, no. 1, pp. 103-129, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Gnanendra Kotikam, and Lokesh Selvaraj, “Golden Eagle Based Improved Att-BiLSTM Model for Big Data Classification with Hybrid Feature Extraction and Feature Selection Techniques,” Network Computation in Neural Systems, vol. 35, no. 2, pp. 154-189, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Zong-Zheng Li et al., “Feature Selection of Gene Expression Data Using a Modified Artificial Fish Swarm Algorithm with Population Variation,” IEEE Access, vol. 12, pp. 72688-72706, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Bhargava K. Chinni, and Cedric Manlhiot, “Emerging Analytical Approaches for Personalized Medicine Using Machine Learning in Pediatric and Congenital Heart Disease,” Canadian Journal of Cardiology, vol. 40, no. 10, pp. 1880-1896, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Pooja Gupta, Abhay Kumar Alok, and Vineet Sharma, “Advancing Gene Expression Data Analysis: An Innovative Multi-Objective Optimization Algorithm for Simultaneous Feature Selection and Clustering,” Brazilian Archives of Biology and Technology, vol. 67, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Hyeonseo Hwang et al., “Big Data and Deep Learning for RNA Biology,” Experimental & Molecular Medicine, vol. 56, no. 6, pp. 1293-1321, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Ghada Mostafa et al., “Feature Reduction for Hepatocellular Carcinoma Prediction Using Machine Learning Algorithms,” Journal of Big Data, vol. 11, no. 1, pp. 1-27, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[19] P. Edwin Dhas et al., “Spatial Clustering Based Gene Selection for Gene Expression Analysis in Microarray Data Classification,” Journal for Control, Measurement, Electronics, Computing and Communications, vol. 65, no. 1, pp. 152-158, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Methaq A. Shyaa et al., “Evolving Cybersecurity Frontiers: A Comprehensive Survey on Concept Drift and Feature Dynamics Aware Machine and Deep Learning in Intrusion Detection Systems,” Engineering Applications of Artificial Intelligence, vol. 137, pp. 1-34, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[21] David Levin, and Gonen Singer, “GB-AFS: Graph-Based Automatic Feature Selection for Multi-Class Classification via Mean Simplified Silhouette,” Journal of Big Data, vol. 11, no. 1, pp. 1-22, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Beatriz Flamia Azevedo, Ana Maria A.C. Rocha, and Ana I. Pereira, “Hybrid Approaches to Optimization and Machine Learning Methods: A Systematic Literature Review,” Machine Learning, vol. 113, no. 7, pp. 4055-4097, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Abdul Rahman Khalid et al., “Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach,” Big Data and Cognitive Computing, vol. 8, no. 1, pp. 1-27, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Anayo Chukwu Ikegwu et al., “Big Data Analytics for Data-Driven Industry: A Review of Data Sources, Tools, Challenges, Solutions, and Research Directions,” Cluster Computing, vol. 25, no. 5, pp. 3343-3387, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Youcef Djenouri et al., “Fast and Effective Cluster-Based Information Retrieval Using Frequent Closed Itemsets,” Information Sciences, vol. 453, pp. 154-167, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Waleed Albattah et al., “Feature Selection Techniques for Big Data Analytics,” Electronics, vol. 11, no. 19, pp. 1-17, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Lina Zhou et al., “Machine Learning on Big data: Opportunities and Challenges,” Neurocomputing, vol. 237, pp. 350-361, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Sangarsu Raghavendra, “Scalability of Data Science Algorithms: Empowering Big Data Analytics,” Journal of Artificial Intelligence and Soft Computing Techniques, vol. 1, no. 1, pp. 1-9, 2024.
[Publisher Link]
[29] Arindam Banerjee, and Joydeep Ghosh, “Scalable Clustering Algorithms with Balancing Constraints,” Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 365-395, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Giulia Vilone, Lucas Rizzo, and Luca Longo, “A Comparative Analysis of Rule-Based, Model-Agnostic Methods for Explainable Artificial Intelligence,” Proceedings for the 28th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, vol. 2771, pp. 85-96, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Pranav Nerurkar et al., “Empirical Analysis of Data Clustering Algorithms,” Procedia Computer Science, vol. 125, pp. 770-779, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Anju Santosh Yedatkar, “Real-Time Data Analytics in Distributed Systems,” International Journal of Scientific Research in Modern Science and Technology, vol. 3, no. 6, pp. 9-16, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Kapil Joshi et al., Big Data-Based Clustering Algorithm Technique: A Review Analysis, Automation and Computation, 1st ed., CRC Press, 2023.
[CrossRef] [Google Scholar] [Publisher Link]