Sparse Coding for Arabic Phoneme Classification

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2017 by IJETT Journal
Volume-54 Number-1
Year of Publication : 2017
Authors : Dima Shaheen, Oumayma Al-Dakkak, MohieldinWianakh
DOI :  10.14445/22315381/IJETT-V54P204

Citation 

Dima Shaheen, Oumayma Al-Dakkak, MohieldinWianakh "Sparse Coding for Arabic Phoneme Classification", International Journal of Engineering Trends and Technology (IJETT), V54(1),17-27 December 2017. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
Sparse Coding has been an active research topic in machine learning and signal processing for the last ten years, as it has achieved impressive results when applied to many problems such as face recognition and image denoising. In this paper, we present a new contribution in applying sparse coding to the problem of Arabic phoneme classification. The classification system which is entitled: Sparse Coding based phoneme Classification system (SCPCS), employs the sparse code as a new speech feature for classification using Sparse Representation Classifier. The Sparse code is simply the “coefficients” of the “sparse” (with many zeros) linear combination of basic signals that can represent the targeted signal as close as possible. We study the impact of the sparse coding solver which aims to produce the sparse code, on its discrimination capability. Experiments to evaluate the proposed system performance were conducted on two manually segmented Arabic phonemes, extracted from KAPD (King Abdulaziz city for science and technology Arabic Phonetic Database) and CSLU2002 (Centre for Spoken Language Understanding) Arabic speech databases. Experimental results showed that the proposed system has achieved an accuracy of 85.3% on KAPD and 53.4% on CSLU2002, which are better than state of the art results in these two datasets.

Reference
[1] M. Elad and M. Aharon, ?Image denoising via sparse and redundant representations over learned dictionaries, IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736-3745, Dec. 2006.
[2] J. Mairal, M. Elad, and G. Sapiro, ?Sparse representation for color image restoration, IEEE Trans. Image Process., vol. 17, no. 1, pp. 53-69, Jan. 2008.
[3] M. Elad, M. A. T. Figueiredo, and Y. Ma, ?On the role of sparse and redundant representations in image processing, Proceedings of the IEEE, vol. 98, no. 6, pp. 972–982, 2010.
[4] D. L. Donoho, ?Compressed sensing, IEEE Trans. on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006.
[5] M. G. Jafari, Samer A., Mark D., Mike E., ?Sparse Coding for Convolutive Blind Audio Source Separation, Book Section, ?Independent Component Analysis and Blind Signal Separation Lecture Notes in Computer Science ISBN 978-3-540-32630-4, Springer Berlin Heidelberg ,2006, P 132-139
[6] X. Zhao, G. Zhou,W. Dai, W. Wang, ?Blind Source Separation Based on Dictionary Learning: A Singularity-Aware Approach Book Section ?Blind Source Separation ISBN 978-3-642-55015-7, Springer Berlin Heidelberg,2014, P 39-59
[7] Wright J, Yang A, Ganesh A, Sastry SS, Ma Y (2009), ?Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31: 210–227
[8] Sainath, T.N., Kanevsky, D, ?Sparse Representations for Speech Recognition, Book Section ?Compressed Sensing & Sparse Filtering” Springer Berlin Heidelberg 2014, P 455-502
[9] Sainath, T.N.; Carmi, A.; Kanevsky, D.; Ramabhadran, B., "Bayesian compressive sensing for phonetic classification", Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on , vol., no., pp.4370,4373, 14-19 March 2010
[10] Sivaram, G.S.V.S.; Nemala, S.K.; Elhilali, M.; Tran, T.D.; Hermansky, H., "Sparse coding for speech recognition," in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, vol., no., pp.4346-4349, 14-19 March 2010
[11] Gemmeke, J.F.; Virtanen, T.; Hurmalainen, A., "Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition," in Audio, Speech, and Language Processing, IEEE Transactions on, vol.19, no.7, pp.2067-2080, Sept. 2011
[12] E. J. Candes and M. B. Wakin, "An Introduction to Compressive Sampling," in IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21-30, March 2008
[13] S. S. Chen, D. L. Donoho, and M. A. Saunders, ?Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1999
[14] Tibshirani, R. (2011), ?Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73: 273–282. doi:10.1111/j.1467-9868.2011.00771.
[15] J. Mairal, F. Bach and J. Ponce, "Task-Driven Dictionary Learning," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 791-804, April 2012.
[16] O. Bryt and M. Elad, ?Compression of facial images using the K-SVD algorithm, Journal of Visual Communication and Image Representation, vol. 19, no. 4, pp. 270–283, 2008.
[17] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, ?Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, In Proc. Asilomar Conf. Signal Syst. Comput., 1993.
[18] D. Needell and J. A. Tropp,?CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal.,vol. 26, no. 3, pp. 301_321, 2009.
[19] A. Y. Yang, Z. Zhou, A. G. Balasubramanian, S. S. Sastry and Y. Ma, "Fast l1-Minimization Algorithms for Robust Face Recognition," in IEEE Transactions on Image Processing, vol. 22, no. 8, pp. 3234-3246, Aug. 2013.
[20] Z. Zhang, Y. Xu, J. Yang, X. Li and D. Zhang, "A Survey of Sparse Representation: Algorithms and Applications," in IEEE Access, vol. 3, no., pp. 490-530, 2015
[21] A. Beck and M. Teboulle, ?A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183_202, 2009.
[22] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, ?Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 586_597, Dec. 2007.
[23] E. G. Birgin, J. M. Mart?nez and M. Raydan, ?Nonmonotone spectral projected gradient methods on convex sets, SIAM Journal on Optimization, 10, pp. 1196–1211, 2000.
[24] J. A. Tropp and S. J. Wright, ?Computational methods for sparse solution of linear inverse problems, Proceedings of the IEEE, vol.98, no. 6, pp. 948–958, 2010.
[25] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, ?An interior point method for large-scale l1 regularized least squares, IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 606_617, Dec. 2007.
[26] SPGL1: A solver for large-scale sparse reconstruction: https://www.math.ucdavis.edu/~mpf/spgl1/
[27] l1 benchmark https://people.eecs.berkeley.edu/~yang/software/l1benchmark/
[28] K. Engan, S. O. Aase, and J. HakonHusoy, ?Method of optimal directions for frame design, in Proceedings of IEEE ICASSP, 1999, vol. 5
[29] M. Aharon, M. Elad, and A. Bruckstein, ?K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
[30] Y. Suo, M. Dao, U. Srinivas, V. Monga, T. D. Tran, ?Structured Dictionary Learning for Classification?, arXiv:1406.1943.
[31] J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, ?Supervised dictionary learning, Advances in Neural Information Processing Systems (NIPS), MIT Press, 2008, pp. 1033–1040.
[32] Q. Zhang and B. Li, ?Discriminative K-SVD for dictionary learning in face recognition, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 2691-2698.
[33] Z. Jiang, Z. Lin, and L. S. Davis, ?Label consistent K-SVD: Learning a discriminative dictionary for recognition IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2651-2664, Nov. 2013.
[34] I. Ramirez, P. Sprechmann, G. Sapiro, ?Classification and clustering via dictionary learning with structured incoherence and shared features, CVPR, IEEE, 2010, pp. 3501–3508.
[35] S. Kong and D. Wang, ?A dictionary learning approach for classification: Separating the particularity and the commonality, in Proc. 12th Eur. Conf. Comput. Vis. (ECCV), 2012, pp. 186-199.
[36] M. Yang, L. Zhang, X. Feng, and D. Zhang, ?Fisher discrimination dictionary learning for sparse representation, in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 543_550. Jun. 2010, pp. 2691-2698.
[37] Meng Yang, Lei Zhang, Xiangchu Feng, ?Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification,International Journal of Computer Vision, 2014.
[38] H. Hermansky and N. Morgan, "RASTA processing of speech", IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994.
[39] A. Fazel and S. Chakrabartty, "Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1362-1371, May 2012.
[40] Hui Yin, Volker Hohmann, ClimentNadeu, ?Acoustic features for speech recognition based on Gammatonefilterbank and instantaneous frequency, Speech Communication, Volume 53, Issue 5, May–June 2011, Pages 707-715, ISSN 0167-6393,
[41] KAPD: KACST Arabic Phonetics Database https://sourceforge.net/projects/kapd/.
[42] CSLU2002 http://www.cslu.ogi.edu/corpora/22lang/.
[43] HMAD, N. and ALLEN, T., 2013. ?Echo State Networks for Arabic phoneme classification and recognition. 34th International Conference on Machine Learning and Pattern Recognition (ICMLPR 2013), Venice, Italy, 14-15 November 2013. Venice, Italy.
[44] HMAD, N. and ALLEN, T.J., 2012. ?Biologically inspired continuous Arabic speech recognition. Research and development in intelligent systems XXIX. In: M. BRAMER and M. PETRIDIS, eds., Research and development in intelligent systems XXIX. London: Springer, pp. 245-258. ISBN 9781447147381.
[45] B. K. Natarajan, ?Sparse approximate solutions to linear systems, SIAM Journal on Computing, 24(1995), 227-234.
[46] J. A. Tropp, "Greed is good: algorithmic results for sparse approximation," in IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231-2242, Oct. 2004.
[47] A. Y. Yang, A. Ganesh, Z. Zhou, S. S. Sastry, and Y. Ma, ?A Review of Fast l1-Minimization Algorithms for Robust Face Recognition, arXiv:1007.3753, 2010.

Keywords
Compressive Sensing, Sparse Coding, phoneme classification, dictionary learning, Sparse Representation Classifier SRC, l1 minimization algorithms, cross-validation.