Enhanced Rough K-Means and Bacterial Foraging Optimization Technique for Document Clustering

Enhanced Rough K-Means and Bacterial Foraging Optimization Technique for Document Clustering

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-6
Year of Publication : 2024
Author : S. Periyasamy, R. Kaniezhil
DOI : 10.14445/22315381/IJETT-V72I6P138

How to Cite?

S. Periyasamy, R. Kaniezhil, "Enhanced Rough K-Means and Bacterial Foraging Optimization Technique for Document Clustering," International Journal of Engineering Trends and Technology, vol. 72, no. 6, pp. 432-441, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I6P138

Abstract
Document clustering is significant in Natural Language Processing (NLP) and Information Retrieval (IR) because it is widely applied in recommendation systems. Learning techniques perform the document clustering process, but it has varying challenges like high dimensionality, scalability, drifting, and large corpus data handling. The research issues are addressed with the help of Enhanced Rough K-Means and Bacterial Foraging Optimization Technique (ERK-BFO). The ERK-BFO was created by integrating the clustering and rough set approach to address the uncertainty and data imprecision issues. The clustering process analyses the document structures that group similar information and increases the clustering efficiency. During this process, the bacterial foraging optimization approach is utilized to predict the optimized cluster centre that improves the convergences and clustering quality. According to the cluster centre values, members are expected to minimize difficulties while exploring the high-dimensional data. Then, the system's effectiveness is evaluated using experimental results, and the ERK-BFO method ensures maximum convergence speed and robustness.

Keywords
Document clustering, Natural language processing, Information retrieval, Bacterial foraging optimization, Convergence speed, Cluster quality.

References
[1] Chengke Wu et al., “Natural Language Processing for Smart Construction: Current Status and Future Directions,” Automation in Construction, vol. 134, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Pranav Kumar et al., “Challenges to Opportunity: Getting Value Out of Unstructured Data Management,” SPE Gas & Oil Technology Showcase and Conference, Dubai, UAE, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Yuzhuo Wang, and Chengzhi Zhang, “Using the Full-Text Content of Academic Articles to Identify and Evaluate Algorithm Entities in the Domain of Natural Language Processing,” Journal of Informetrics, vol. 14, no. 4, pp. 1-21, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Guang Li et al., “Research on the Natural Language Recognition Method Based on Cluster Analysis Using Neural Network,” Mathematical Problems in Engineering, pp. 1-13, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Laith Abualigah et al., “Nature-Inspired Optimization Algorithms for Text Document Clustering—A Comprehensive Analysis,” Algorithms, vol. 13, no. 12, pp. 1-32, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Majid Hameed Ahmed et al., “Short Text Clustering Algorithms, Application and Challenges: A Survey,” Applied Sciences, vol. 13, no. 1, pp. 1-38, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Wonjik Kim, Asako Kanezaki, and Masayuki Tanaka, “Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering,” IEEE Transactions on Image Processing, vol. 29, pp. 8055-8068, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Saurabh Kulkarni, and Sunil F. Rodd, “Context Aware Recommendation Systems: A Review of the State of the Art Techniques,” Computer Science Review, vol. 37, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Shaeela Ayesha, Muhammad Kashif Hanif, and Ramzan Talib, “Overview and Comparative Study of Dimensionality Reduction Techniques for High Dimensional Data,” Information Fusion, vol. 59, pp. 44-58, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Feifei Shen et al., “Large-Scale Industrial Energy Systems Optimization under Uncertainty: A Data-Driven Robust Optimization Approach,” Applied Energy, vol. 259, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Bassoma Diallo et al., “Multi-View Document Clustering Based on Geometrical Similarity Measurement,” International Journal of Machine Learning and Cybernetics, vol. 13, pp. 663-675, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Bilal Bataineh, and Ahmad A. Alzahrani, “Fully Automated Density-Based Clustering Method,” Computers, Materials & Continua, vol. 76, no. 2, pp. 1833-1851, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Seyed Mahdi Miraftabzadeh et al., “K-Means and Alternative Clustering Methods in Modern Power Systems,” IEEE Access, vol. 11, pp. 119596-119633, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Derek Greene, and Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania USA, pp. 377-384, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Vivek Mehta, Seema Bawa, and Jasmeet Singh, “WEClustering: Word Embeddings Based Text Clustering Technique for Large Datasets,” Complex & Intelligent Systems, vol. 7, no. 6, pp. 3211-3224, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Ehsan Sherkat, Evangelos E. Milios, and Rosane Minghim, “A Visual Analytics Approach for Interactive Document Clustering,” ACM Transactions on Interactive Intelligent Systems, vol. 10, no. 1, pp. 1-33, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Stephan A. Curiskis et al., “An Evaluation of Document Clustering and Topic Modelling in Two Online Social Networks: Twitter and Reddit,” Information Processing & Management, vol. 57, no. 2, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Maziar Moradi Fard, Thibaut Thonet, and Eric Gaussier, “Deep K-Means: Jointly Clustering with K-Means and Learning Representations,” Pattern Recognition Letters, vol. 138, pp. 185-192, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Nidhika Yadav, “Neighborhood Rough Set Based Multi-Document Summarization,” arXiv, pp. 1-7, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[20] R. Janani, and S. Vijayarani, “Text Document Clustering Using Spectral Clustering Algorithm with Particle Swarm Optimization,” Expert Systems with Applications, vol. 134, pp. 192-200, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Arun Kumar Sangaiah et al., “Arabic Text Clustering Using Improved Clustering Algorithms with Dimensionality Reduction,” Cluster Computing, vol. 22, pp. 4535-4549, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Ammar Kamal Abasi et al., “Link-Based Multi-Verse Optimizer for Text Documents Clustering,” Applied Soft Computing, vol. 87, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Laith Abualigah et al., “Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering,” Electronics, vol. 10, no. 2, pp. 1-29, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Nabil Alami et al., “Unsupervised Neural Networks for Automatic Arabic Text Summarization Using Document Clustering and Topic Modeling,” Expert Systems with Applications, vol. 172, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Shapol M. Mohammed, Karwan Jacksi, and Subhi R. M. Zeebaree, “A State-of-the-Art Survey on Semantic Similarity for Document Clustering Using GloVe and Density-Based Algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 1, pp. 552-562, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Renchu Guan et al., “Deep Feature-Based Text Clustering and its Explanation,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 3669-3680, 2020.
[CrossRef] [Google Scholar] [Publisher Link]