Speech Translation: A Bibliometric Analysis of Research Trends and Contributions based on Scopus Data
Speech Translation: A Bibliometric Analysis of Research Trends and Contributions based on Scopus Data |
||
![]() |
![]() |
|
© 2025 by IJETT Journal | ||
Volume-73 Issue-4 |
||
Year of Publication : 2025 | ||
Author : Maria Labied, Abdessamad Belangour, Mouad Banane |
||
DOI : 10.14445/22315381/IJETT-V73I4P116 |
How to Cite?
Maria Labied, Abdessamad Belangour, Mouad Banane, "Speech Translation: A Bibliometric Analysis of Research Trends and Contributions based on Scopus Data," International Journal of Engineering Trends and Technology, vol. 73, no. 4, pp.159-179, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I4P116
Abstract
This study provides a comprehensive bibliometric analysis of speech translation research from 2000 to 2024, leveraging Scopus database data to identify key trends, influential contributions, and collaborative networks in this rapidly evolving field. We map the transition from traditional statistical methods to advanced neural and deep learning approaches in speech translation technologies by analysing publication patterns, citation metrics, and research themes. Our findings highlight the most prolific authors, institutions, and countries, along with the leading journals and conferences that serve as primary outlets for high-impact research. Notably, the analysis reveals a substantial increase in research activity and a growing focus on end-to-end translation systems and multilingual corpora, demonstrating the field's shift towards scalable and effective real-world applications. The importance of international collaborations and interdisciplinary research is emphasized, showcasing their role in driving innovation and addressing complex challenges. This bibliometric analysis provides valuable insights for researchers, practitioners, and policymakers, offering a foundational understanding of the current landscape and future directions of speech translation research. By elucidating the dynamics of this field, our work aims to inspire further advancements and enhance the impact of future research efforts.
Keywords
Speech Translation, Bibliometric Analysis, End-to-End speech translation, Direct speech translation, Machine translation, Automatic speech translation.
References
[1] Peter F. Brown et al., “A Statistical Approach To Machine Translation,” Computational Linguistics, vol. 16, no. 2, pp. 79-85, 1990. [Google Scholar] [Publisher Link] [2] Yonghui Wu et al., “Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation,” arXiv, pp. 1-23, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Nivedita Sethiya, and Chandresh Kumar Maurya, “End-to-End Speech-to-Text Translation: A Survey,” arXiv, pp. 1-75, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Thierry Etchegoyhen et al., “Cascade or Direct Speech Translation? A Case Study,” Applied Sciences, vol. 12, no. 3, pp. 1-24, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] L. Bentivogli et al., “Cascade Versus Direct Speech Translation: Do the Differences Still Make A Difference?,” Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, pp. 2873-2887, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Ye Jia et al., “Direct Speech-to-Speech Translation with A Sequence-to-Sequence Model,” arXiv, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Jan Niehues et al., “Tutorial: End-to-End Speech Translation,” Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, pp. 10-13, 2021.
[CrossRef] [Publisher Link]
[8] Parnia Bahar, Tobias Bieschke, and Hermann Ney, “A Comparative Study on End-to-End Speech to Text Translation,” IEEE Automatic Speech Recognition and Understanding Workshop, Singapore, pp. 792-799, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Sameer Bansal et al., “Low-Resource Speech-to-Text Translation,” arXiv, pp. 1-5, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Sameer Bansal et al., “Towards Speech-to-Text Translation without Speech Recognition,” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, pp. 474-479, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Hirofumi Inaguma et al., “Multilingual End-to-End Speech Translation,” IEEE Automatic Speech Recognition and Understanding Workshop, Singapore, pp. 570-577, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Yuchen Liu et al., “Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 8417-8424, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Yichao Du et al., “Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. 10590-10598, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Gerard I. Gállego et al., “End-to-End Speech Translation with Pre-trained Models and Adapters: {UPC} at {IWSLT} 2021,” Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT), pp. 110-119, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Xuancai Li et al., “End-to-End Speech Translation with Adversarial Training,” Proceedings of the First Workshop on Automatic Simultaneous Translation, pp. 10-14, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Zhihao Zhou et al., “Sign-to-Speech Translation Using Machine-Learning-Assisted Stretchable Sensor Arrays,” Nature Electronics, vol. 3, no. 9, pp. 571-578, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Shigeki Karita et al., “A Comparative Study on Transformer vs RNN in Speech Applications,” IEEE Automatic Speech Recognition and Understanding Workshop, Singapore, pp. 449-456, 2019. [CrossRef] [Google Scholar] [Publisher Link]
[18] Roldano Cattoni et al., “MuST-C: A Multilingual Corpus for End-to-End Speech Translation,” Computer Speech & Language, vol. 66, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Philipp Koehn et al., “Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation,” Proceedings of the Second International Workshop on Spoken Language Translation, 2005.
[Google Scholar] [Publisher Link]
[20] Richard Zens, Franz Josef Och, and Hermann Ney, “Phrase-Based Statistical Machine Translation,” KI: Advances in Artificial Intelligence, pp. 18-32, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Ron J. Weiss et al., “Sequence-to-Sequence Models Can Directly Translate Foreign Speech,” arXiv, pp. 1-5, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Takuya Yoshioka et al., “Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 114-126, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Toshiyuki Takezawa et al., “Toward A Broad-Coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World,” Proceedings of the Third International Conference on Language Resources and Evaluation, pp. 147-152, 2002.
[Google Scholar] [Publisher Link]
[24] Pengcheng Guo et al., “Recent Developments on Espnet Toolkit Boosted By Conformer,” IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, pp. 5874-5878, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] B. Bangalore, G. Bordel, and G. Riccardi, “Computing Consensus Translation From Multiple Machine Translation Systems,” IEEE Workshop on Automatic Speech Recognition and Understanding, Madonna di Campiglio, Italy, pp. 351-354, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Ashish Vaswani et al., “Attention is all You Need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[Google Scholar] [Publisher Link]
[27] Alec Radford et al., “Robust Speech Recognition Via Large-Scale Weak Supervision,” Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, pp. 28492-28518, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Loïc Barrault et al., “SeamlessM4T-Massively Multilingual & Multimodal Machine Translation,” arXiv, pp. 1-111, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Christian Fügen, Alex Waibel, and Muntsin Kolss, “Simultaneous Translation of Lectures and Speeches,” Machine Translation, vol. 21, no. 4, pp. 209-252, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[30] S. Nakamura et al., “The ATR Multilingual Speech-to-Speech Translation System,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 365-376, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Mirjam Wester, “Talker Discrimination Across Languages,” Speech Communication, vol. 54, no. 6, pp. 781-790, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Takatomo Kano, Sakriani Sakti, and Satoshi Nakamura, “End-to-End Speech Translation with Transcoding by Multi-Task Learning for Distant Language Pairs,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1342-1355, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Shigeki Matsuda et al., “Development of the ‘VoiceTra’ Multi-Lingual Speech Translation System,” IEICE Transactions on Information System, vol. E100-D, no. 4, pp. 621-632, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Yoshinori Shiga, and Hisashi Kawai, “Multilingual Speech Synthesis System,” Journal of the National Institute of Information and Communications Technology, vol. 59, no. 3.4, pp. 21-28, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Inese Vīra, Jānis Teseļskis, and Inguna Skadiņa, “Towards the Development of the Multilingual Multimodal Virtual Agent,” Advances in Natural Language Processing, pp. 470-477, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Ebrahim Ansari et al., “Findings of the IWSLT 2020 Evaluation Campaign,” Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1-34, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Milind Agarwal et al., “Findings of the Iwslt 2023 Evaluation Campaign,” Proceedings of the 20th International Conference on Spoken Language Translation, pp. 1-61, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Antonios Anastasopoulos et al., “Findings of the Iwslt 2021 Evaluation Campaign,” Proceedings of the 18th International Conference on Spoken Language Translation, pp. 1-29, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[39] Antonios Anastasopoulos et al., “Findings of the IWSLT 2022 Evaluation Campaign,” Proceedings of the 19th International Conference on Spoken Language Translation, pp. 98-157, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Jian Wu et al., “On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration,” IEEE Automatic Speech Recognition and Understanding Workshop, Taipei, Taiwan, pp. 1-8, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Genichiro Kikui et al., “Creating Corpora for Speech-to-Speech Translation,” 8th European Conference on Speech Communication and Technology, pp. 381-384, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Tanja Schultz et al., “Using Word Latice Information for A Tighter Coupling in Speech Translation Systems,” Interspeech, 8th International Conference on Spoken Language Processing ICC Jeju, Jeju Island, Korea, pp. 41-44, 2004.
[CrossRef] [Google Scholar] [Publisher Link]