Embedded Device Keyword Spotting Model with Quantized Convolutional Neural Network

Embedded Device Keyword Spotting Model with Quantized Convolutional Neural Network

  IJETT-book-cover           
  
© 2025 by IJETT Journal
Volume-73 Issue-3
Year of Publication : 2025
Author : Salma A. Alhashimi, Ali Aliedani
DOI : 10.14445/22315381/IJETT-V73I3P117

How to Cite?
Salma A. Alhashimi, Ali Aliedani, "Embedded Device Keyword Spotting Model with Quantized Convolutional Neural Network," International Journal of Engineering Trends and Technology, vol. 73, no. 3, pp. 230-236, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I3P117

Abstract
This research investigates the challenges of implementing machine learning models in keyword-spotting applications on embedded devices. A convolutional neural network for the Arabic keyword-spotting model is utilized in embedded devices represented by microcontrollers. The assessments are conducted on both the trained and pre-trained models. The QCoNet is the proposed model that considers the resource constraints of embedded devices by employing the quantization technique to decrease the necessary computational and memory capacity. Furthermore, MobileNet v4, a pre-trained model, is assessed under the same conditions. MobileNet v4 achieves a better accuracy of 99.2% compared to QCoNet, which achieves an accuracy of 96.7%. Nevertheless, it incurs additional expenses for processing and storage. The Arduino Nano 33 BLE is utilized for data collection and model deployment. With different noise level circumstances, the Arabic words were recorded. Furthermore, data augmentation techniques are employed to enhance the likelihood of the system's responsiveness to various environmental situations.

Keywords
Keyword spotting, Convolution Neural Network, Quantization,MobileNet v4, QCoNet.

References
[1] Ming Sun et al., “Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting,” Amazon Science, pp. 1-5, 2017.
[Google Scholar] [Publisher Link]
[2] B.H. Juang, and L.R. Rabiner, “Hidden Markov Models for Speech Recognition,” Technometrics, vol. 33, no. 3, pp. 251-272, 1991.
[CrossRef] [Google Scholar] [Publisher Link]
[3] J.G. Wilpon, L.G. Miller, and P. Modi, “Improvements and Applications for Key Word Recognition Using Hidden Markov Modeling Techniques,” International Conference on Acoustics, Speech, and Signal Processing, Canada, vol. 1, pp. 309-312 1991.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Tara N. Sainath, and Carolina Parada, “Convolutional Neural Networks for Small Footprint Keyword Spotting,” Interspeech, pp. 1478-1482, 2015.
[CrossRef] [Publisher Link]
[5] Iván López-Espejo et al., “Deep Spoken Keyword Spotting: An Overview,” IEEE Access, vol. 10, pp. 4169-4199, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Sauptik Dhar et al., “A Survey of On-Device Machine Learning: An Algorithms and Learning Theory Perspective,” ACM Transactions on Internet of Things, vol. 2, no. 3, pp. 1-49, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Vivienne Sze et al., “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Ji Lin et al., “Mcunet: Tiny Deep Learning on IoT Devices,” Advances in Neural Information Processing Systems, vol. 33, pp. 11711-11722, 2020.
[Google Scholar] [Publisher Link]
[9] Swapnil Sayan Saha et al., “Tinyodom: Hardware-Aware Efficient Neural Inertial Navigation,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, no. 2, pp. 1-32, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Swapnil Sayan Saha et al., “Auritus: An Open-Source Optimization Toolkit for Training and Development of Human Movement Models and Filters Using Earables,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, no. 2, pp. 1-3, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Guoguo Chen, Carolina Parada, and Georg Heigold, “Small-Footprint Keyword Spotting Using Deep Neural Networks,” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp. 4087-4091, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Assaf Hurwitz Michaely et al., “Keyword Spotting for Google Assistant Using Contextual Speech Recognition,” IEEE Automatic Speech Recognition and Understanding Workshop, Okinawa, Japan, pp. 272-278, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Samuel Myer, and Vikrant Singh Tomar, “Efficient Keyword Spotting Using Time Delay Neural Networks,” arXiv, pp. 1-5, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Yonatan Alon, “Real-Time Low-Resource Phoneme Recognition on Edge Devices,” arXiv, pp. 1-20, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Jong Ee Jeoung, You Kok Yeow, and Muhammad Mun`im Ahmad Zabidi, “Keyword Spotting on Embedded System with Deep Learning,” Proceedings of 2019 Electrical Engineering Symposium, vol. 3, pp. 87-91, 2019.
[Google Scholar]
[16] Danyar Nabaz, Noraimi Shafie, and Azizul Azizan, “Design of Emergency Keyword Recognition Using Arduino Nano BLE Sense 33 And EdgeImpulse,” Open International Journal of Informatics, vol. 11, no. 2, pp. 46-57, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Jyoti Mishra, Tomithy Malche, and Amit Hirawat, “Embedded Intelligence for Smart Home Using TinyML Approach to Keyword Spotting,” Engineering Proceedings, vol. 82, no, 1, pp. 1-9, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Emre Yılmaz et al., Deep Convolutional Spiking Neural Networks for Keyword Spotting,” Proceedings of the Interspeech, China, pp. 2557-2561, 2020.
[Google Scholar] [Publisher Link]
[19] Motaz Al-Hami et al., “Towards a Stable Quantized Convolutional Neural Networks: An Embedded Perspective,” 10th International Conference on Agents and Artificial Intelligence, vol. 2, pp. 573-580, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[20] A. Atsalakis et al., “Colour Quantisation Technique Based on Image Decomposition and its Embedded System Implementation,” IEE Proceedings - Vision, Image and Signal Processing, vol. 151, no. 6, pp. 511-524, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Neeraj Magotra et al., “Real-Time Energy Efficient Embedded System Development Methodology,” 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), Napa, CA, USA, pp. 284-289, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Yundong Zhang et al., “Hello Edge: Keyword Spotting on Microcontrollers,” arXiv, pp. 1-14, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Raphael Tang, and Jimmy Lin, “Deep Residual Learning for Small-Footprint Keyword Spotting,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, pp. 5484-5488, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Andrew G. Howard et al., “Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv, pp. 1-9, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[25] David Crystal, English as a Global Language, 2nd ed., Cambridge University Press, pp. 1-229, 2003.
[Google Scholar] [Publisher Link]
[26] Pete Warden, and Daniel Situnayake, Tinyml: Machine Learning with Tensorflow Lite on Arduino and Ultra-Low-Power Microcontrollers, O'Reilly Media, pp. 1-149, 2020.
[Google Scholar] [Publisher Link]