Analysis of Various Measures of Text Similarity for Comparing Topics of Computer Science Syllabuses

Analysis of Various Measures of Text Similarity for Comparing Topics of Computer Science Syllabuses

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-7
Year of Publication : 2024
Author : Ritu Sodhi, Jitendra Choudhary, Ritu Jain, Ruby Bhatt, Ritesh Joshi, Anil Patidar
DOI : 10.14445/22315381/IJETT-V72I7P128

How to Cite?

Ritu Sodhi, Jitendra Choudhary, Ritu Jain, Ruby Bhatt, Ritesh Joshi, Anil Patidar, "Analysis of Various Measures of Text Similarity for Comparing Topics of Computer Science Syllabuses," International Journal of Engineering Trends and Technology, vol. 72, no. 7, pp. 260-265, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I7P128

Abstract
Text similarity measures are used to find out how much different texts are similar. There is a need to compare text for document comparison, text classification, text summarizing, information retrieval, question-answer sessions, clustering documents, etc. There is also a need to compare computer science terms; while plagiarism checks, website contents, comparing syllabuses of the same subject, notes, books, etc. This research focused on the text similarity measures to compare text related to computer science terms. This research executed some of the lexical and semantic similarity measures for comparing topics of the syllabus of programming using Python. And found after executing various approaches that spacy using a large English model and cos_similarity together gives a better result. In the future, this research can be improved by including more similarity measures and by increasing the size of the dataset for comparison of computer science terms.

Keywords
Computer science, Python, Spacy, Syllabus, Text similarity.

References
[1] Xiaofang Liao, and Zijiang Zhu, “Classification of Natural Language Semantic Relations under Deep Learning,” 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications, Dalian, China, pp. 1025-1027, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Artem A. Maksutov et al., “Knowledge Base Collecting Using Natural Language Processing Algorithms,” 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, St. Petersburg and Moscow, Russia, pp. 405-407, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Palakorn Achananuparp, Xiaohua Hu, and Xiajiong Shen, “The Evaluation of Sentence Similarity Measures,” Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science, vol. 5182, pp. 305-316, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Issa Atoum, Ahmed Otoom, and Narayanan Kulathuramaiyer, “A Comprehensive Comparative Study of Word and Sentence Similarity Measures,” International Journal of Computer Applications, vol. 135, no. 1, pp. 10-17, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Wael H. Gomaa, and Aly A. Fahmy, “A Survey of Text Similarity Approaches,” International Journal of Computer Applications, vol. 68, no. 13, pp. 13-18, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Jiaxing Tan et al., “Sentence Retrieval with Sentiment-Specific Topical Anchoring for Review Summarization,” Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore Singapore, pp. 2323-2326, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Aliaksei Severyn, and Alessandro Moschitti, “Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks,” Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, United States, pp. 373-382, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Dastan Hussen Maulud et al., “State of Art for Semantic Analysis of Natural Language Processing,” Qubahan Academic Journal, vol. 1, no. 2, pp. 21-28, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Hamed Jelodar et al., “A Collaborative Framework Based for Semantic Patients-Behavior Analysis and Highlight Topics Discovery of Alcoholic Beverages in Online Healthcare Forums,” Journal of Medical Systems, vol. 44, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Xiaolong Wang, Xingtong Dong, and Shuxin Chen, “Text Duplicated-Checking Algorithm Implementation Based on Natural Language Semantic Analysis,” 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference, Chongqing, China, pp. 732-735, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Qingyu Chen et al., “Sentence Similarity Measures Revisited: Ranking Sentences in PubMed Documents,” Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington DC, USA, pp. 531-532, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Zhe Quan et al., “An Efficient Framework for Sentence Similarity Modeling,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 853-865, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Shuang Peng et al., “Enhanced-RCNN: An Efficient Method for Learning Sentence Similarity,” Proceedings of The Web Conference 2020, Taipei, Taiwan, pp. 2500-2506, 2020.
[CrossRef] [Google Scholar] [Publisher Link]