Clustering with Enhanced Word Embeddings for Contextual Analysis in Academic Texts

Clustering with Enhanced Word Embeddings for Contextual Analysis in Academic Texts

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-6
Year of Publication : 2024
Author : Mary Joy P. Canon, Lany L. Maceda, Christian Y. Sy
DOI : 10.14445/22315381/IJETT-V72I6P118

How to Cite?

Mary Joy P. Canon, Lany L. Maceda, Christian Y. Sy, "Clustering with Enhanced Word Embeddings for Contextual Analysis in Academic Texts," International Journal of Engineering Trends and Technology, vol. 72, no. 6, pp. 170-177, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I6P118

Abstract
To provide deserving Filipino students access to higher education, the Universal Access to Quality Education (UAQTE) program was enacted into law. However, despite its years of implementation, there remains a lack of comprehensive understanding of its perceived impact and feedback among its recipients. This paper explored an advanced text analysis approach in contextual understanding of text responses related to the implementation of the UAQTE by employing enhanced word embeddings from Word2Vec and Glove vectors, K-Means clustering algorithm and bi-gram word network. The combination of Word2vec and Glove embeddings captured the semantic meaning of words within the dataset. Five distinct groups were identified using the K-means algorithm which gained a decent silhouette score of 0.3477. Based on the computed TF-IDF scores for the bi-grams, top sequences for each cluster were used for the visualization of a text network graph. Accordingly, domain experts labeled the clusters of responses as “Support and Educational Opportunity”, “Accessibility and Financial Relief”, “Gratitude and Satisfaction”, “Positive Evaluation with Suggestions for Improvement” and “Program Effectiveness”. This approach not only highlights the strengths of the UAQTE program in providing support to the beneficiaries but also reveals certain areas needing attention and improvement, which are crucial in policy development and enhancement. Future work may focus on diversified data by incorporating feedback from other stakeholders, such as program implementers and educators.

Keywords
Clustering, Enhanced word embedding, Program Evaluation, Quality tertiary education, Text analysis.

References
[1] Fengqin Liu et al., "Retracted Article: Role of Education in Poverty Reduction: Macroeconomic and Social Determinants form Developing Economies,” Environmental Science and Pollution Research, vol. 28, pp. 63163-63177, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[2] R. Ukwueze Ezebuilo, and O. Nwosu Emmanuel, “Does Higher Education Reduce Poverty among Youths in Nigeria?,” Asian Economic Financial Review, vol. 4, no. 1, pp. 1-19, 2014.
[Google Scholar] [Publisher Link]
[3] İrem Demirbağ, and Sedef Sezgin, “Book Review: Guidelines on the Development of Open Educational Resources Policies,” The International Review of Research in Open and Distributed Learning, vol. 22, no. 2, pp. 261-263, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Shiohira Kelly, “Understanding the Impact of Artificial Intelligence on Skills Development. Education 2030,” 2021.
[Google Scholar] [Publisher Link]
[5] Akemi Ashida, The Role of Higher Education in Achieving the Sustainable Development Goals, Sustainable Development Disciplines for Humanity, Springer, Singapore, pp. 71-84, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Official Gazette, The 1987 Constitution of the Republic of the Philippines – Article II. [Online]. Available: https://www.officialgazette.gov.ph/constitutions/the-1987-constitution-of-the-republic-of-the-philippines/the-1987-constitution-of-therepublic-of-the-philippines-article-ii/
[7] P. Ortiz Ma. Kristina et al., “Process Evaluation of the Universal Access to Quality Tertiary Education Act (RA 10931): Status and Prospects for Improved Implementation,” Philippine Institute for Development Studies, Quezon City, 2019.
[Google Scholar] [Publisher Link]
[8] Chen Hao, Maurice Simiyu Nyaranga, and Duncan O. Hongo, “Enhancing Public Participation in Governance for Sustainable Development: Evidence From Bungoma County, Kenya,” Sage Open, vol. 12, no. 1, pp. 1-15, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Yoshua Bengio, Réjean Ducharme, and Pascal Vincent, “A Neural Probabilistic Language Model,” Advances in Neural Information Processing Systems, vol. 13, 2000.
[Google Scholar] [Publisher Link]
[10] Tomas Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” Advances in Neural Information Processing Systems, vol. 26, 2013.
[Google Scholar] [Publisher Link]
[11] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, “Glove: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532-1543, 2014.
[Google Scholar] [Publisher Link]
[12] Piotr Bojanowski et al., “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Qinjun Qiu et al., “Geoscience Keyphrase Extraction Algorithm Using Enhanced Word Embedding,” Expert Systems with Applications, vol. 125, pp. 157-169, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Ye Qi et al., “When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?,” arXiv, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Rajdeep Biswas, and Suman De, "A Comparative Study on Improving Word Embeddings Beyond Word2Vec and GloVe," 2022 Seventh International Conference on Parallel, Distributed and Grid Computing, Solan, Himachal Pradesh, India, pp. 113-118, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Weili Zhang et al., "Big Data Mining and Analysis of Hot Issues in International Education—Based on K-Means Algorithm of Cluster Analysis," 2020 International Conference on Information Science and Education, Sanya, China, pp. 1-4, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Liangjie Yuan, Huizhou Zhao, and Zhimin Wang, "Research on News Text Clustering for International Chinese Education," 2023 International Conference on Asian Language Processing, Singapore, Singapore, pp. 377-382, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Yuxiang Zou, "Construction of Hot Spot Tracking Model of University Network Public Opinion Based on Text Clustering,” 2021 IEEE 5th Information Technology,Networking,Electronic and Automation Control Conference, Xi'an, China, pp. 76-80, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Jing Tao et al., "Cluster Analysis on Chinese University Students’ Conceptions of English Language Learning and their Online SelfRegulation,” Australasian Journal of Educational Technology, vol. 36, no. 2, pp. 105-119, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Deepak Agnihotri, Kesari Verma, and Priyanka Tripathi, "Pattern and Cluster Mining on Text Data," 2014 Fourth International Conference on Communication Systems and Network Technologies, Bhopal, India, pp. 428-432, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[21] J. Martí-Parreño, E. Méndez-Ibáñez, and A. Alonso-Arroyo, “The Use of Gamification in Education: A Bibliometric and Text Mining Analysis,” Journal of Computer Assisted Learning, vol. 32, no. 6, pp. 663-676, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Wei Jin, and Rohini Kesavan Srihari, “Graph-Based Text Representation and Knowledge Discovery,” Proceedings of the 2007 ACM Symposium on Applied Computing, Seoul Korea, pp. 807-811, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Hongbin Wang et al., "Unsupervised Keyword Extraction Methods Based on a Word Graph Network,” International Journal of Ambient Computing and Intelligence, vol. 11, no. 2, pp. 68-79, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Punthira Chinotaikul, and Sukrit Vinayavekhin, "Digital Transformation in Business and Management Research: Bibliometric and Coword Network Analysis," 2020 1st International Conference on Big Data Analytics and Practices, Bangkok, Thailand, pp. 1-5, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Didier A. Vega-Oliveros et al., “A Multi-Centrality Index for Graph-Based Keyword Extraction,” Information Processing & Management, vol. 56, no. 6, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Simon Briscoe, Rebecca Abbott, and G.J. Melendez-Torres, “Expert Searchers Identified Time, Team, Technology and Tension as Challenges when Carrying Out Supplementary Searches for Systematic Reviews: A Thematic Network Analysis,” Health Information & Libraries Journal, vol. 41, no. 2, pp. 182-194, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Tomas Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” International Conference on Learning Representations, 2013.
[Google Scholar] [Publisher Link]
[28] Tomas Mikolov, Quoc V. Le, and Ilya Sutskever, “Exploiting Similarities among Languages for Machine Translation,” arXiv, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[29] I.T. Jolliffe, Principal Component Analysis, Springer, pp. 1-487, 2002.
[Google Scholar] [Publisher Link]
[30] Dan A. Simovici, and Chabane Djeraba, Clustering, Mathematical Tools for Data Mining, Advanced Information and Knowledge Processing, pp. 767-817, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Edy Umargono, Jatmiko Endro Suseno, and S.K Vincensius Gunawan, “K-Means Clustering Optimization Using the Elbow Method and Early Centroid Determination Based on Mean and Median Formula,” Proceedings of the 2nd International Seminar on Science and Technology (ISSTEC 2019), 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Rajendra Kumar Roul, Jajati Keshari Sahoo, and Kushagr Arora, “Modified TF-IDF Term Weighting Strategies for Text Categorization,” 2017 14th IEEE India Council International Conference, Roorkee, India, pp. 1-6, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Francesco Bonchi, Aristides Gionis, and Antti Ukkonen, "Overlapping Correlation Clustering,” Knowledge and Information Systems, vol. 35, pp. 1-32, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Arindam Banerjee et al., “Model-Based Overlapping Clustering,” Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago Illinois USA, pp. 532-537, 2005.
[CrossRef] [Google Scholar] [Publisher Link]