Hybrid Model Design for Baseline-Context-Independent-Mono-Phone Automatic Speech Recognition
Citation
Amr M. Gody, Rania Ahmed AbulSeoud, Marian M.Ibraheem"Hybrid Model Design for Baseline-Context-Independent-Mono-Phone Automatic Speech Recognition", International Journal of Engineering Trends and Technology (IJETT), V27(6),304-313 September 2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group
Abstract
In this research new hybrid model for Automatic Speech Recognition is introduced. The model is constructed as a hybrid of Mel-Scale and 15-Bit Best Tree Encoding (BTE). Best Tree Encoding is first introduced in [1] as new feature for solving Automatic Speech recognition (ASR). The model is compared with MFCC. HTK is used as Recognition Engine. The model is also compared with the old generations of BTE to evaluate the performance of Context-independent mono-phone recognition of English language. Sub class of TIMIT database is used in all experiments through this research .The proposed model gives success rate equals to 96%with respect to the success rate of the reference MFCC in solving the same problem but vector size is 33% of MFCC vector size.
References
[1] Amr M. Gody, "Wavelet Packets Best Tree 4 Points Encoded (BTE) Features", The Eighth Conference on Language Engineering, Ain-Shams University, Cairo, Egypt,PP 189-198, 17-18 December 2008.
[2] Barnard, E, Gouws, E, Wolvaardt, K and Kleynhans, N. 2004. "Appropriate baseline values for HMM-based speech recognition". 15th Annual Symposium of the Pattern Recognition Association of South Africa, Grabouw, South Africa, 25 to 26 November 2004.
[3] Amr M. Gody, Rania Ahmed AbulSeoud,Mohamed Hassan "Automatic Speech Annotation Using HMM based on Best Tree Encoding (BTE) Feature", The Eleventh Conference on Language Engineering, Ain-Shams University, PP. 153-159 ,December 2011, Cairo, Egypt.
[4] Amr M. Gody, Rania Ahmed AbulSeoud,Maha M. Adham, Eslam E. Elmaghraby "Automatic Speech Using Wavelet Packets Increased Resolution Best Tree Encoding", The Twelfth Conference on Language Engineering, Ain-Shams University,PP. 126-134, December 2012, Cairo, Egypt.
[5] Amr M. Gody, Rania Ahmed AbulSeoud,Eslam E. Elmaghraby "Automatic Speech Recognition Of Arabic Phones Using Optimal- Depth – Split –EnergyBesttree Encoding", The Twelfth Conference on Language Engineering, Ain-Shams University, PP. 144-156, December 2012, Cairo, Egypt.
[6] Michel Misiti, Yves Misiti, Georges Oppenheim, Jean-Michel Poggi, "Wavelet Toolbox for Use with MATLAB: User’s Guide", The MathWorks, Inc., Version 1, 1996.
[7] MatLab,http://www.mathworks.com/access/helpdesk/help/tool box/wavelet/ch06_a11.html.
[8] http://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Co mmunication
[9] R.R. Coifman, M.V. Wickerhauser, "Entropy-based Algorithms for best basis selection," IEEE Trans. on Inf.Theory, vol. 38, 2, PP. 713-718, 1992.
[10] Steve Young, Mark Gales, Xunying Andrew Liu, Phil Woodland,et al. ,2006 The HTK Book, Version 3.41, Cambridge University Engineering Department, http://www.htk.eng,cam.ac.uk.
[11] Nasir Ahmad, "A motion based approachfor audio-visual automatic speech recognition", A Doctoral Thesis. Submitted in partial fulfillment of the requirementsfor the award of Doctor of Philosophy of Loughborough University.
[12] HTK Book documentation, "http://htk.eng.cam.ac.uk/docs/docs.shtml".
[13] Amr M. Gody, Rania Ahmed AbulSeoud, Mai Ezz El-Din,"Using Mel-Mapped Best Tree Encoding for Baseline-Context-Independent-Mono-Phone Automatic Speech Recognition",2015.
[14] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3.4). Cambridge, U.K.: Cambridge Univ. Eng. Dept., 2006
[15] Amr M. Gody, Rania Ahmed AbulSeoud,Mohamed Hassan "Automatic Speech Annotation Using HMM based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) Feature", PESCT 2013, FayoumUniveristy, 2013
[16] Barnard, E, Gouws, E, Wolvaardt, K and Kleynhans, N. 2004. "Appropriate baseline values for HMM-based speech recognition". 15th Annual Symposium of the Pattern Recognition Association of South Africa, Grabouw, South Africa, 25 to 26 November 2004
[17] MatLab,http://www.mathworks.com/access/helpdesk/help/tool box/wavelet/ch06_a11.html.
[18] http://www.researchgate.net/publication/251754208_An_HM M_based_speakerindependent_continuous_speech_recognitio n_system_with_experiments_on_the_TIMIT_database
[19] Carla Lopes and Fernando Perdigao (2011). "Phoneme Recognition on the TIMIT Database", Speech Technologies, Prof. IvoIpsic(Ed.),ISBN:978-953-307-996-7,InTech,Availablefro m: http://www.intechopen.com/books/speech-technologies/phone me-recognition-on-the-timit-database
Keywords
Automatic Speech recognition, English Phone Recognition, Wavelet packets, Mel scale, MFCC, HTK and Best Tree Encoding.