Computational Tool for Automatic Term Extraction - ATEM

A. Morales Ríos; C.M. Medina Otálvaro; J.C. Blandón Andrade; C.M. Zapata Jaramillo

doi:https://doi.org/10.14445/22315381/IJETT-V74I6P105

Research Article | Open Access | Download PDF

Volume 74 | Issue 6 | Year 2026 | Article Id. IJETT-V74I6P105 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I6P105

Computational Tool for Automatic Term Extraction - ATEM

A. Morales Ríos, C.M. Medina Otálvaro, J.C. Blandón Andrade, C.M. Zapata Jaramillo

Received	Revised	Accepted	Published
12 May 2025	23 Mar 2026	28 Mar 2026	27 Jun 2026

Citation :

A. Morales Ríos, C.M. Medina Otálvaro, J.C. Blandón Andrade, C.M. Zapata Jaramillo, "Computational Tool for Automatic Term Extraction - ATEM," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 6, pp. 66-74, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I6P105

Abstract

Automatic term extraction enables the identification of the most representative terms within a corpus through computational processes. This process facilitates the creation of lexicographic materials or common databases, which are pivotal for knowledge acquisition in science as they help eliminate ambiguity in definitions pertaining to a specific domain. Specialized literature highlights the need for a common foundation on best practices for the Internet of Things (IoT) to consolidate knowledge and adapt new working methods. However, the manual creation of terminological resources is inefficient, does not keep pace with the rapid evolution of subjects, and is both time-consuming and costly. This article introduces ATEM, a term extraction tool for web and mobile environments that incorporates a hybrid method for identifying relevant terms in English-language scientific literature on IoT. ATEM is developed using a Service-Oriented Architecture (SOA) and employs programming languages such as JavaScript and Python. It also uses tools like the Flask framework and NLP-specific libraries such as NLTK and SpaCy. The computational tool includes the CValue algorithm, along with statistical and linguistic techniques in several steps: (i) corpus reception; (ii) text preprocessing; (iii) stop-word removal; (iv) Part-of-Speech (POS) tagging; and (v) filtering through linguistic and statistical rules. This results in a list of potential terms and a weight indicating their relevance within the corpus. The method was tested on five corpora from different domains, and ATEM processes and retrieves terms with 75% precision and 89% recall, highlighting its versatility across corpora. According to the tests, ATEM supports terminological extraction from IoT literature. It contributes to: (i) the development of lexicographic resources; (ii) language translation; and (iii) the creation of shared databases.

Keywords

Automatic Term Extraction, C-Value Algorithm, Hybrid Linguistic-Statistical Methods, Internet of Things, Natural Language Processing.

References

[1] Augusto Cortez Vásquez, Hugo Vega Huerta, and Jaime Pariona Quispe, “Natural Language Processing,” Journal of Systems and Informatics Research, vol. 6, no. 2, pp. 45-54, 2009.
[Google Scholar]

[2] Alexander Gelbukh, “Natural Language Processing and its Applications,” Computer Sapiens, vol. 1, pp. 6-11, 2010.
[Google Scholar]

[3] Rosa Estopà Bagot, “Terminology Extraction: Elements for the Construction of an Extractor,” TradTerm, vol. 7, pp. 225-250, 2001.
[Google Scholar]

[4] Dag I.K. Sjøberg et al., Building Theories in Software Engineering, Guide to Advanced Empirical Software Engineering, Springer, London, pp. 312-336, 2008.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Alexander Alvaro Barón Salazar, “Model for the Unified Definition of Practice as a Theoretical Construct in Software Engineering,” Doctoral Thesis, National University of Colombia, Medellín, Colombia, 2019.
[Google Scholar] [Publisher Link]

[6] Tim Gemkow et al., “Automatic Glossary Term Extraction from Large-Scale Requirements Specifications,” 2018 IEEE 26^th International Requirements Engineering Conference (RE), Banff, AB, Canada, pp. 412-417, 2018.
[CrossRef] [Google Scholar] [Publisher Link]

[7] Wiktoria Golik et al., “Improving Term Extraction with Linguistic Analysis in the Biomedical Domain,” Research in Computing Science, vol. 70, pp. 157-172, 2013.
[Google Scholar]

[8] Niladri Chatterjee, and Neha Kaushik, “RENT: Regular Expression and NLP-based Term Extraction Scheme for Agricultural Domain,” Proceedings of the International Conference on Data Engineering and Communication Technology, ICDECT, Springer, Singapore, vol. 468, pp. 511-522, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[9] Angela Luque Giraldez, and Miriam Seghiri Domínguez, “3DCOR: Creation of a Bilingual (English-Spanish) Corpus-based Glossary for Translating Technical Specifications of 3D Printers,” Proceedings of the III International Congress on Computational and Corpus Linguistics - CILCC 2020 and the V Workshop on Automated Text Processing and Corpus - WoPATeC, University of Antioquia, Medellín, pp. 86-89, 2020.
[Google Scholar]

[10] Ivar Jacobson, Ian Spence, and Pan-weing, “Is there a Single Method for the Internet of Things? Essence can Keep Software Development for the IoT from Becoming Unwieldy,” Queue, vol. 15, no. 3, pp. 25-51, 2017.
[Google Scholar]

[11] Tatiana Gornostay et al., “Terminology Extraction, Translation Tools and Comparable Corpora: TTC Concept, Midterm Progress and Achieved Results,” LREC 2012 Workshop on Creating Cross-Language Resources for Disconnected Languages and Styles (CREDISLAS), 2012.
[Google Scholar] [Publisher Link]

[12] Jesus Santamaria, and Martin Krallinger, “Construction of Medical Terminological Resources for Spanish: The CUTEXT Term Extraction System and Biomedical Term Repositories,” Natural Language Processing, vol. 61, pp. 49-56, 2018.
[Google Scholar]

[13] Rodrique Kafando et al., “ITEXT-BIO: Intelligent Term EXTraction for BIOmedical Analysis,” Health Information Science and Systems, vol. 9, no. 1, pp. 1-23, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Ayla Rigouts Terryn, Veronique Hoste, and Els Lefever, “D-Terminer: Online Demo for Monolingual and Bilingual Automatic Term Extraction,” Proceedings of the Workshop on Terminology in the 21^st Century: Many Faces, Many Places, European Language Resources Association, Marseille, France, pp. 33-40, 2022.
[Google Scholar] [Publisher Link]

[15] Amelia De Irazazabal, and Erika Schwarz, Terminological Databases as an AID to the Translator, III Complutense Encounters on Translation, Cervantes Institute, 1993.
[Google Scholar] [Publisher Link]

[16] M. Teresa Cabré, “TERMINTEGRAL: A Platform for Building Terminological Databases and Ontologies,” Linguistica Antverpiensia, New Series-Themes in Translation Studies, vol. 3, pp. 245-261, 2004.
[CrossRef] [Google Scholar] [Publisher Link]

[17] Ricardo Campos et al., “YAKE! Keyword Extraction from Single Documents using Multiple Local Features,” Information Sciences, vol. 509, pp. 257-289, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[18] Sue Ellen Wright, and Gerhard Budin, Handbook of Terminology Management: Application-Oriented Terminology Management, John Benjamins Publishing, vol. 2, 2001.
[CrossRef] [Google Scholar] [Publisher Link]

[19] Luis Alberto Barrón Cedeño, “Automatic Extraction of Terms in Defining Contexts,” Master’s Thesis, UNAM-Faculty of Engineering, Mexico, D.F., 2007.
[Google Scholar] [Publisher Link]

[20] Hiroshi Nakagawa, and Tatsunori Mori, “A Simple but Powerful Automatic Term Extraction Method,” COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology, 2002.
[Google Scholar]

[21] Zhang Liwei, “Chinese Technical Terminology Extraction based on DC-Value and Information Entropy,” Scientific Reports, vol. 12, no. 1, pp. 1-12, 2022.
[CrossRef] [Google Scholar] [Publisher Link]

[22] Damien Cram, and Beatrice Daille, “TermSuite: Terminology Extraction with Term Variant Detection,” Proceedings of the 54^th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, Berlin, Germany, pp. 13-18, 2016.
[Google Scholar]

[23] İrfan AygÜn, and Mehmet Kaya, “Automatic Term Extraction on Turkish Scientific Texts,” 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, pp. 1037-1040, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[24] Vasile Pais, and Radu Ion, “TermEval 2020: RACAI’s Automatic Term Extraction System,” Proceedings of the 6^th International Workshop on Computational Terminology, European Language Resources Association, Marseille, France, pp. 101-105, 2020.
[Google Scholar] [Publisher Link]

[25] Antoni Oliver, and Mercè Vàzquez, “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction,” Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 473-479, 2015.
[Google Scholar]

[26] Ian Sommerville, Software Engineering, 10^th ed., Pearson Education Limited, 2016.
[Publisher Link]

[27] P. Clements et al., “Documenting Software Architectures: Views and Beyond,” 25^th International Conference on Software Engineering, 2003. Proceedings., Portland, OR, USA, pp. 740-741, 2003.
[CrossRef] [Google Scholar] [Publisher Link]

[28] Daniel Jurafsky, and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Boulder, CO, USA: Pearson/Prentice Hall, 2008.
[Google Scholar]

[29] Katerina Frantzi, Sophia Ananiadou, and Hideki Mima, “Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method,” International Journal on Digital Libraries, vol. 3, no. 2, pp. 115-130, 2000.
[CrossRef] [Google Scholar] [Publisher Link]

[30] Ziqi Zhang, Jie Gao, and Fabio Ciravegna, “JATE 2.0: Java Automatic Term Extraction with Apache Solr,” Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), European Language Resources Association, Portorož, Slovenia, pp. 2262-2269, 2016.
[Google Scholar] [Publisher Link]