Research Article | Open Access | Download PDF
Volume 74 | Issue 6 | Year 2026 | Article Id. IJETT-V74I6P105 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I6P105Computational Tool for Automatic Term Extraction - ATEM
A. Morales Ríos, C.M. Medina Otálvaro, J.C. Blandón Andrade, C.M. Zapata Jaramillo
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 12 May 2025 | 23 Mar 2026 | 28 Mar 2026 | 27 Jun 2026 |
Citation :
A. Morales Ríos, C.M. Medina Otálvaro, J.C. Blandón Andrade, C.M. Zapata Jaramillo, "Computational Tool for Automatic Term Extraction - ATEM," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 6, pp. 66-74, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I6P105
Abstract
Automatic term extraction enables the identification of the most representative terms within a corpus through computational processes. This process facilitates the creation of lexicographic materials or common databases, which are pivotal for knowledge acquisition in science as they help eliminate ambiguity in definitions pertaining to a specific domain. Specialized literature highlights the need for a common foundation on best practices for the Internet of Things (IoT) to consolidate knowledge and adapt new working methods. However, the manual creation of terminological resources is inefficient, does not keep pace with the rapid evolution of subjects, and is both time-consuming and costly. This article introduces ATEM, a term extraction tool for web and mobile environments that incorporates a hybrid method for identifying relevant terms in English-language scientific literature on IoT. ATEM is developed using a Service-Oriented Architecture (SOA) and employs programming languages such as JavaScript and Python. It also uses tools like the Flask framework and NLP-specific libraries such as NLTK and SpaCy. The computational tool includes the CValue algorithm, along with statistical and linguistic techniques in several steps: (i) corpus reception; (ii) text preprocessing; (iii) stop-word removal; (iv) Part-of-Speech (POS) tagging; and (v) filtering through linguistic and statistical rules. This results in a list of potential terms and a weight indicating their relevance within the corpus. The method was tested on five corpora from different domains, and ATEM processes and retrieves terms with 75% precision and 89% recall, highlighting its versatility across corpora. According to the tests, ATEM supports terminological extraction from IoT literature. It contributes to: (i) the development of lexicographic resources; (ii) language translation; and (iii) the creation of shared databases.
Keywords
Automatic Term Extraction, C-Value Algorithm, Hybrid Linguistic-Statistical Methods, Internet of Things, Natural Language Processing.
References
[1] Augusto Cortez Vásquez, Hugo Vega
Huerta, and Jaime Pariona Quispe, “Natural Language Processing,” Journal of
Systems and Informatics Research, vol. 6, no. 2, pp. 45-54, 2009.
[Google Scholar]
[2] Alexander Gelbukh, “Natural Language
Processing and its Applications,” Computer Sapiens, vol. 1, pp. 6-11,
2010.
[Google Scholar]
[3] Rosa Estopà Bagot, “Terminology
Extraction: Elements for the Construction of an Extractor,” TradTerm,
vol. 7, pp. 225-250, 2001.
[Google Scholar]
[4] Dag I.K. Sjøberg et al., Building
Theories in Software Engineering, Guide to Advanced Empirical Software
Engineering, Springer, London, pp. 312-336, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Alexander Alvaro Barón Salazar, “Model
for the Unified Definition of Practice as a Theoretical Construct in Software
Engineering,” Doctoral Thesis, National University of Colombia, Medellín,
Colombia, 2019.
[Google Scholar] [Publisher Link]
[6] Tim Gemkow et al.,
“Automatic Glossary Term Extraction from Large-Scale Requirements
Specifications,” 2018 IEEE 26th International Requirements
Engineering Conference (RE), Banff, AB, Canada, pp. 412-417, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Wiktoria Golik et al.,
“Improving Term Extraction with Linguistic Analysis in the Biomedical Domain,” Research
in Computing Science, vol. 70, pp. 157-172, 2013.
[Google Scholar]
[8] Niladri Chatterjee, and
Neha Kaushik, “RENT: Regular Expression and NLP-based Term Extraction Scheme
for Agricultural Domain,” Proceedings of the International Conference on
Data Engineering and Communication Technology, ICDECT, Springer, Singapore, vol. 468, pp. 511-522, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Angela Luque Giraldez, and Miriam
Seghiri Domínguez, “3DCOR: Creation of a Bilingual (English-Spanish)
Corpus-based Glossary for Translating Technical Specifications of 3D Printers,”
Proceedings of the III International Congress on Computational and Corpus
Linguistics - CILCC 2020 and the V Workshop on Automated Text Processing and
Corpus - WoPATeC, University of Antioquia, Medellín, pp. 86-89, 2020.
[Google Scholar]
[10] Ivar Jacobson, Ian Spence,
and Pan-weing, “Is there a Single Method for the Internet of Things? Essence can Keep Software Development for the
IoT from Becoming Unwieldy,” Queue, vol. 15, no. 3, pp. 25-51, 2017.
[Google Scholar]
[11] Tatiana Gornostay et al.,
“Terminology Extraction, Translation Tools and Comparable Corpora: TTC Concept,
Midterm Progress and Achieved Results,” LREC 2012 Workshop on Creating
Cross-Language Resources for Disconnected Languages and Styles (CREDISLAS),
2012.
[Google Scholar] [Publisher Link]
[12] Jesus Santamaria, and
Martin Krallinger, “Construction of Medical Terminological Resources for
Spanish: The CUTEXT Term Extraction System and Biomedical Term Repositories,” Natural
Language Processing, vol. 61, pp. 49-56, 2018.
[Google Scholar]
[13] Rodrique Kafando et al.,
“ITEXT-BIO: Intelligent Term EXTraction for BIOmedical Analysis,” Health
Information Science and Systems, vol. 9, no. 1, pp. 1-23, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Ayla Rigouts Terryn,
Veronique Hoste, and Els Lefever, “D-Terminer: Online Demo for Monolingual and
Bilingual Automatic Term Extraction,” Proceedings of the Workshop on
Terminology in the 21st Century: Many Faces, Many Places,
European Language Resources Association, Marseille, France, pp. 33-40, 2022.
[Google Scholar] [Publisher Link]
[15] Amelia De Irazazabal, and
Erika Schwarz, Terminological Databases as an AID to the Translator, III
Complutense Encounters on Translation, Cervantes Institute, 1993.
[Google Scholar] [Publisher Link]
[16] M. Teresa Cabré,
“TERMINTEGRAL: A Platform for Building Terminological Databases and
Ontologies,” Linguistica Antverpiensia, New Series-Themes in Translation
Studies, vol. 3, pp. 245-261, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Ricardo Campos et al.,
“YAKE! Keyword Extraction from Single Documents using Multiple Local Features,”
Information Sciences, vol. 509, pp. 257-289, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Sue Ellen Wright, and
Gerhard Budin, Handbook of Terminology Management: Application-Oriented
Terminology Management, John Benjamins Publishing, vol. 2, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Luis
Alberto Barrón Cedeño, “Automatic Extraction of Terms in Defining Contexts,”
Master’s Thesis, UNAM-Faculty of Engineering, Mexico, D.F., 2007.
[Google Scholar] [Publisher Link]
[20] Hiroshi Nakagawa, and
Tatsunori Mori, “A Simple but Powerful Automatic Term Extraction Method,” COLING-02:
COMPUTERM 2002: Second International Workshop on Computational Terminology,
2002.
[Google Scholar]
[21] Zhang Liwei, “Chinese
Technical Terminology Extraction based on DC-Value and Information Entropy,” Scientific
Reports, vol. 12, no. 1, pp. 1-12, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Damien Cram, and Beatrice
Daille, “TermSuite: Terminology Extraction with Term Variant Detection,” Proceedings
of the 54th Annual Meeting of the Association for Computational
Linguistics-System Demonstrations, Berlin, Germany, pp. 13-18, 2016.
[Google Scholar]
[23] İrfan AygÜn, and Mehmet
Kaya, “Automatic Term Extraction on Turkish Scientific Texts,” 2020
International Conference on Decision Aid Sciences and Application (DASA),
Sakheer, Bahrain, pp. 1037-1040, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Vasile Pais, and Radu Ion,
“TermEval 2020: RACAI’s Automatic Term Extraction System,” Proceedings of
the 6th International Workshop on Computational Terminology, European Language Resources Association, Marseille, France,
pp. 101-105, 2020.
[Google Scholar] [Publisher Link]
[25] Antoni Oliver, and Mercè
Vàzquez, “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology
Extraction,” Proceedings of the International Conference Recent Advances in
Natural Language Processing, Hissar, Bulgaria, pp. 473-479, 2015.
[Google Scholar]
[26] Ian Sommerville, Software
Engineering, 10th ed., Pearson Education Limited, 2016.
[Publisher Link]
[27] P. Clements et al.,
“Documenting Software Architectures: Views and Beyond,” 25th
International Conference on Software Engineering, 2003. Proceedings.,
Portland, OR, USA, pp. 740-741, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Daniel Jurafsky, and James
H. Martin, Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition,
Boulder, CO, USA: Pearson/Prentice Hall, 2008.
[Google Scholar]
[29] Katerina Frantzi, Sophia
Ananiadou, and Hideki Mima, “Automatic Recognition of Multi-Word Terms: The
C-Value/NC-Value Method,” International Journal on Digital Libraries,
vol. 3, no. 2, pp. 115-130, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Ziqi Zhang, Jie Gao, and Fabio Ciravegna, “JATE 2.0:
Java Automatic Term Extraction with Apache Solr,” Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC’16), European Language Resources Association,
Portorož, Slovenia, pp. 2262-2269, 2016.
[Google Scholar] [Publisher Link]