Investigating Khasi Speech Recognition Systems using a Recurrent Neural Network-Based Language Model

Fairriky Rynjah; Bronson Syiem; L. Joyprakash Singh

doi:https://doi.org/10.14445/22315381/IJETT-V70I7P227

Research Article | Open Access | Download PDF

Volume 70 | Issue 7 | Year 2022 | Article Id. IJETT-V70I7P227 | DOI : https://doi.org/10.14445/22315381/IJETT-V70I7P227

Investigating Khasi Speech Recognition Systems using a Recurrent Neural Network-Based Language Model

Fairriky Rynjah, Bronson Syiem, L. Joyprakash Singh

Received	Revised	Accepted	Published
15 May 2022	01 Jul 2022	10 Jul 2022	25 Jul 2022

Citation :

Fairriky Rynjah, Bronson Syiem, L. Joyprakash Singh, "Investigating Khasi Speech Recognition Systems using a Recurrent Neural Network-Based Language Model," International Journal of Engineering Trends and Technology (IJETT), vol. 70, no. 7, pp. 269-274, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I7P227

Abstract

The language model (LM) plays a vital role in automatic speech recognition systems (ASRs), and it remains a challenging task, particularly with low/under-resourced languages. Khasi language being an under-resourced language, very little study has been done on the Khasi speech recognition system. To date, no Khasi speech recognition system has been developed using a recurrent neural network-based language model (RNN-LM). This paper presents an investigation of Khasi speech recognition systems using an RNN-LM. In the study, different acoustic models (AMs) are built. The study shows that RNN-LM performs better compared to the traditional N-gram model. Further, using RNN-LM, a reduction of word error rate (WER) in the range of 2.8-3.8% for more speech data and 2.4-3.5% for lesser speech data are observed. In addition, two acoustic features are studied, and from the experimental results, it is found that the Mel frequency cepstral coefficient (MFCC) yields better performance than perceptual linear prediction (PLP). The investigation is performed in the two most widely spoken dialects of the Khasi language.

Keywords

Acoustic model, Deep neural network, Language model, Under-resourced language, Word error rate.

References

[1] L. Besacier, E. Barnard, A. Karpov and T. Schultz, “Automatic Speech Recognition for Under-Resourced Languages: A Survey,” Speech Communication, Vol 56, No.1, Pp.85–100, 2014.
[2] F. De Wet, N. Kleynhans, D. Compemello and R. Sahraeian, “Speech Recognition for Under-Resourced Languages: Data Sharing in Hiddenmarkov Model Systems,” South African Journal of Science, Vol.113, No.(1/2), Pp.1-9, 2017.
[3] .E. Baum and J.A. Eagon, “An Inequality with Applications To Statistical Estimation for Probabilistic Functions of Markov Processes and To A Model for Ecology,” Bulletin of American Mathematical Society, Vol. 73 , Pp. 360–363, 1967.
[4] M. Gales and S. Young, “the Application of Hidden Markov Models in Speech Recognition,” Foundations and Trends in Signal Processing, Vol.1, No.3, Pp.195–304, 2007.
[5] V. Manohar, D. Povey and S. Khudanpur, “Semi-Supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models,” Interspeech, Germany, Pp. 2630–2634, 2015.
[6] L. Longfei, Z. Yong, J. Dongmei and Z. Yanning, “Hybrid Deep Neural Network - Hidden Markov Model (Hmm-Dnn) Based Speech Emotion Recognition,” Humaine Association Conference on Affective Computing and Intelligent Interaction, Switzerland, Ieee Computer Society, Pp. 312-317, 2013.
[7] F. Seide, G. Li, X. Chen and D. Yu, “Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription,” Automatic Speech Recognition and Understanding (Asru), Ieee Workshop, Pp. 24–29, 2011.
[8] G.E. Hinton, S. Osindero and Y.W. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, Vol.18, No.7, Pp.1527–1554, 2006.
[9] T. Mikolov, M. Karafiat, L. Burget, J.H. Cernocky and S. Khudanpur, “Recurrent Neural Network Based Language Model,” Interspeech, Japan, Pp. 1045-1048, 2010.
[10] M. Dua, R.K. Aggarwal and M. Biswas, “Discriminatively Trained Continuous Hindi Speech Recognition System Using Interpolated Recurrent Neural Network Language Modeling,” Neural Computing and Applications, Vol. 31, Pp. 6747-6755, 2018.
[11] B. Syiem, S.K. Dutta, J Binong and L.J. Singh, “Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States,” Journal of Electronic Science and Technology, Vol.19, No.2, Pp.1-7, 2020.
[12] J. Ashraf, N. Iqbal, N.S. Khattak and A.M. Zaidi, “Speaker Independent Urdu Speech Recognition,” International Conference on Informatics and Systems (Infos), Egypt, Pp. 1-5, 2010. [13] P. Upadhyaya, S.K. Mittal, O. Farooq, Y.V. Varshney and M.R. Abidi, “Continuous Hindi Speech Recognition Using Kaldi Asr Based on Deep Neural Network,” Advances in Intelligent Systems and Computing, Springer, Singapore, Vol.748, Pp.303–311, 2019.
[14] B. M. Popovic, S. Ostrogonac, E. Pakoci, N. Jakovljevic and V. Delic, “Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolki, Speech and Computer,” Lecture Notes in Computer Science, Springer, Greece, Vol. 9319, Pp.186- 192, 2015.
[15] P. Smit, S. Virpioja and M. Kurimo, “Advance in Subword-Based Hmm-Dnn Speech Recognition Across Languages”, Computer Speech and Language,” Vol.66 , Pp.1-17, 2020.
[16] A. Amberkar, G. Deshmukh, P. Awasarmol and P. Dave, “Speech Recognition Using Neural Network,” Ieee International Conference on Current Trends Towards Converging Technologies, Coimbatore, Pp.1-4, 2018.
[17] A. Gandhe, F. Metze and I. Lane, “Neural Network Language Models for Low Resource Languages,” Interspeech, Singapore. Pp. 2615-2619, 2014.
[18] B. Syiem and L.J. Singh, “Deep Neural Network-Based Phoneme Classification of Standard Khasi Dialect,” International Journal of Applied Pattern Recognition, Vol.6, No.1, Pp.43-51, 2019.
[19] E. Syiem, Ka Ktien Nongkrem Ha Ki Pdeng Rngi Lum Ka Ri Lum Khasi, Lynnong 7, Book Chapter From People’s Linguistic Survey of India, Meghalaya, Vol.19, Pp.135-136, 2014.
[20] S. Bareh, Khasi Proverbs: “Analysing the Ethnography of Speaking Folklore”, Ph.D. Dissertation, Dept. Cultural and Creative Studies,” North Eastern Hill University, Shillong, 2007.
[21] B. Syiem and L.J. Singh, “Exploring End-To-End Framework Towards Khasi Recognition System,” International Journal of Speech Technology, Vol.24, No.8 , Pp.419-424, 2021.
[22] J. Guglani and A.N. Mishra, “Continuous Punjabi Speech Recognition Model Based on Kaldi Asr Toolkit,” International Journal of Speech Technology, Vol.21 , Pp. 211-216, 2018.
[23] F. Rynjah, B. Syiem, and L.J. Singh, “Khasi Speech Recognition System Using Hidden Markov Model with Different Spectral Features: A Comparison,” International Conference on Industry Innovations in Science, Engineering and Technology, 2019.
[24] P. Upadhyaya, S.K. Mittal, Y.V. Varshney, O. Farooq and M.R. Abidi, “Speaker Adaptive Model for Hindi Speech Using Kaldi Speech Recognition Toolkit,” International Conference on Multimedia, Signal Processing and Communication Technologies (Impact), Pp. 232-236, 2017.
[25] F. Rynjah, B. Syiem, and L.J. Singh, “Speech Recognition System of Spoken Isolated Digit in Standard Khasi Dialect,” Proceedings of International Conference on Frontiers in Computing and Systems,” Lecture Notes in Networks and Systems, Springer, Vol. 404 , Pp. 541–549, 2021.