A Speech-based Sentiment Analysis using Combined Deep Learning and Language Model on Real-Time Product Review

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2021 by IJETT Journal
Volume-69 Issue-1
Year of Publication : 2021
Authors : Maganti Syamala, N.J.Nalini
DOI :  10.14445/22315381/IJETT-V69I1P226

Citation 

MLA Style: Maganti Syamala, N.J.Nalini. "A Speech-based Sentiment Analysis using Combined Deep Learning and Language Model on Real-Time Product Review" International Journal of Engineering Trends and Technology 69.1(2021):172-178.

APA Style:Maganti Syamala, N.J.Nalini. A Speech-based Sentiment Analysis using Combined Deep Learning and Language Model on Real-Time Product Review  International Journal of Engineering Trends and Technology, 69(1), 172-178.

Abstract
Sentiment analysis is the area of study in Natural Language Processing (NLP), where it has gained its popularity in text analytics for making any kind of purchase decision. Also, there is a need for speech-based sentiment analysis in real-world applications for providing a better quality of service. But the work carried out in the speech domain has gained very less attention. So, this paper proposed a speech sentiment analysis model by considering spectrogram as an acoustic feature. The spectrogram features are trained over a deep learning model and an N-gram Language model. A combined Convolutional Neural Network (CNN) and Bi-directional-Recurrent Neural Network (Bi-RNN) architecture frameworks are implemented for acoustic modeling and a bi-gram language model to calculate the likelihood of a particular word sequence from the spoken utterance. NLP techniques like the Vader Sentiment Intensity Analyzer function is used for performing the sentiment analysis. The experimental results are analyzed in terms of Word Error Rate (WER) and Character Error Rate (CER) and proved that the proposed model holds outperforming WER and CER of 5.7% and 3 % when compared with the traditional Automatic Speech Recognition (ASR) models. The obtained sentiment analysis results are measured using correctly classified instances, precision, recall, and f1-score using various machine learning algorithms. The logistic Regression algorithm proved to achieve improved accuracy of 90% with the proposed speech sentiment analysis model.

Reference
[1] M.S. Hossain, G. Muhammad.,Emotion Recognition Using Deep Learning Approach from Audio-Visual Emotional Big Data, Information Fusion, (2018) 1-24.
[2] Y. Zeng, H. Mao, D. Peng, Z. Yi.,Spectrogram based multi-task audio classification. Multimed. Tools Appl, 78(2019) 3705–3722, 2019.
[3] M. Syamala, and N.J. Nalini.,A filter-based improved decision tree sentiment classification model for real-time amazon product review data., International Journal of Intelligent Engineering and Systems, 13(1)(2020) 191-202.
[4] M. Syamala, and N.J. Nalini.,A deep analysis on aspect-based sentiment text classification approaches, International Journal of Advanced Trends in Computer Science and Engineering, 8(5) (2019) 1795-1801.
[5] S. Maghilnan and M. R. Kumar.,Sentiment analysis on speaker-specific speech data, in Proceedings of International Conference on Intelligent Computing and Control (I2C2), (2017) 1-5.
[6] Z. Luo, H. Xu, and F. Chen.,Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network, AffCon@AAAI, (2019).
[7] B. Li, D. Dimitriadis and A. Stolcke.,Acoustic and Lexical Sentiment Analysis for Customer Service Calls,in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2019) 5876-5880.
[8] L. Kaushik, A. Sangwan, and J. H. L. Hansen.,Sentiment extraction from natural audio streams, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, 8485-8489, (2013).
[9] S. Ezzat, N. E. Gayar, and. M. Ghanem.,Sentiment Analysis of Call Centre Audio Conversations using Text Classification, International Journal of Computer Information Systems and Industrial Management Applications, 4(2012) 619 -627.
[10] S. Govindaraj, and K. Gopalakrishnan, "Intensified Sentiment Analysis of Customer Product Reviews Using Acoustic and Textual Features, ETRI Journal, 38 (3)(2016) 494-501.
[11] T. Ko, V. Peddinti, D. Povey, S. Khudanpur, "Audio Augmentation for Speech Recognition. in Proceedings of INTERSPEECH, (2015).
[12] V. Liptchinsky, G. Synnaeve, R. Collobert.,Letter-Based Speech Recognition with Gated ConvNets., ArXiv abs/1712.09444, 2017.
[13] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur., Librispeech: An ASR corpus based on public domain audiobooks, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2015) 5206-5210.
[14] Dario Amodei et al.,Deep speech 2: end-to-end speech recognition in English and mandarin, in Proceedings of the 33rd International Conference on International Conference on Machine Learning, 48(2016) 173–182.
[15] A. Zeyer, K. Irie, R. Schluter, and H. Ney.,Improved training of end-to-end attention models for speech recognition., in Proceedings of INTERSPEECH, (2018).
[16] N. Zeghidour, Q. Xu, V. Liptchinsky, N.Usunier, G. Synnaeve, and R. Collobert.,Fully Convolutional Speech Recognition., (2018).
[17] Ch. Lüscher, E. Beck, K. Irie, M. Kitza, W. Michel, A. Zeyer, R. Schlüter, and H. Ney.,RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation,( 2019).
[18] K. Han, A. Chandrashekaran, J. Kim, and I. Lane.,The CAPIO 2017 Conversational Speech Recognition System, (2017).
[19] D. Le, Duc, X. Zhang, W. Zheng, Ch. Fugen, G. Zweig, and M. Seltzer.,From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition, in Proceedings of International Conference of Automatic Speech Recognition and Understanding Workshop (ASRU)IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 457-464, (2019).
[20] K. J. Han, A. Chandrashekaran, J. Kim, and I. Lane.,The CAPIO 2017 Conversational Speech Recognition System, in arXiv, (2017).
[21] X. Yang, J. Li, and X. Zhou.,A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition, in arXiv, (2018).
[22] J. Li, V. Lavrukhin, B. Ginsburg, R. Leary, O. Kuchaiev, J. M. Cohen, H. Nguyen, and R. T. Gadde, Jasper: An End-to-End Convolutional Neural Acoustic Model, in arXiv, (2019).
[23] A. Zeyer, A. Merboldt, R. Schluter, and H. Ney.,A comprehend- ¨sive analysis on attention models., in NIPS: Workshop IRASL,( 2018).
[24] K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. Nguyen.,Model Unit Exploration for Sequence-to-Sequence Speech Recognition.,in arXiv, (2019).
[25] S. Benkerzaz, Y. Elmir and A. Dennai.,A Study on Automatic Speech Recognition, Journal of Information Technology Review, 10(2019) 77-85.
[26] L. P. Maguluri, and R. Ragupathy.,Comparative Analysis of Companies Stock Price Prediction Using Time Series Algorithm, International Journal of Engineering Trends and Technology 68.11(2020) 9-15.
[27] Kanusu Srinivasa Rao,Mandapati Sridhar., Sustainable Development of Green Communication through Threshold Visual Cryptography Schemes Using a Population Based Incremental Learning Algorithm, Journal of Green Engineering,11(1)(2020) 608-624.

Keywords
Acoustic, Character Error Rate, Convolutional Neural Network, Machine Learning, Natural language processing, Recurrent Neural Network, Speech, Spectrogram, Sentiment analysis, Word Error Rate.