Research Article | Open Access | Download PDF
Volume 74 | Issue 4 | Year 2026 | Article Id. IJETT-V74I4P114 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I4P114A Hybrid Deep Learning Framework for Indian Sign Language Recognition from Images and Videos
S. Jayalakshmi, S.P. Balamurugan
| Received | Revised | Accepted | Published |
|---|---|---|---|
| 23 Jun 2025 | 11 Feb 2026 | 28 Feb 2026 | 29 Apr 2026 |
Citation :
S. Jayalakshmi, S.P. Balamurugan, "A Hybrid Deep Learning Framework for Indian Sign Language Recognition from Images and Videos," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 4, pp. 177-195, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I4P114
Abstract
This paper proposes a comprehensive deep learning approach for Indian Sign Language (ISL) recognition based on static image and dynamic video modalities. Based on a common use of spatial and temporal complexities in sign language gestures, this paper proposes two types of specialist hybrids. For static gestures, a CNN-Transformer model is introduced, which effectively combines multiple types of convolutional networks with Transformer models to capture the image features depending on global contextual relationships. A CNN-LSTM model based on the ConvLSTM2D layer is then used to jointly learn both spatial and temporal relations between sequences of video frames, responsible for generating dynamic sign sequences. Full-fledged repository containing the data, which is a balanced set of 36 ISL signs for each modality, along with uniform preprocessing steps like grayscale normalization, Gaussian smoothing, resizing, and sequence padding to maintain homogeneity in inputs. To guarantee computational efficiency while preserving gesture clarity, frames are temporally sampled in the video pipeline. Dynamic class weighting is applied during training to balance the classes, while early stopping and learning rate schedulers ensure that convergence is optimal without leading to overfitting. The models are tested and trained, and five-fold cross-validation is used for statistical strength and generalization. These metrics are calculated, and furthermore, the model's performance is plotted using confusion matrices, ROC curves, and accuracy-loss curves. The CNN-Transformer model has excellent image-based classification performance, and the CNN-LSTM model has very good motion-based temporal features capturing capability. This dual-modality system shows a successful merger of spatial and spatiotemporal deep neural network architectures for real-time, accurate ISL recognition. This system can easily be incorporated into assistive communication tools (such as sign-to-speech translators), educational resources, and mobile applications, thus facilitating inclusion for the hearing and speech-impaired communities.
Keywords
Indian Sign Language recognition, CNN-Transformer model, CNN-LSTM Model, Static Image, Dynamic Video, Spatial and Temporal Features, Five-Fold Cross-Validation.
References
[1] Harsh
Kumar Vashisth et al., “Hand Gesture Recognition in Indian Sign Language using
Deep Learning,” Engineering Proceedings, vol. 59, no. 1, pp. 1-11,
2023.
[CrossRef] [Google
Scholar] [Publisher Link]
[2] Prachi
Pramod Waghmare et al., “Deep Learning Approach for Combined Indian Sign
Language Recognition and Video Generation Model,” International Journal of Intelligent Systems and Applications in
Engineering, vol. 12, no. 4, pp. 3296-3302, 2024.
[Publisher Link]
[3] S. Nithyanandh, “AI-Driven Indian Sign Language Recognition
using Hybrid CNN-BiLSTM Architecture for Divyangjan,” International
Journal of Emerging Science and Engineering (IJESE), vol. 14, no. 1, pp. 14-25, 2025.
[CrossRef] [Publisher Link]
[4] Jay Joshi, and Dhaval Patel, “Transformer-based Deep Learning
Approach for Indian Sign Language Recognition,” International Journal of All Research Education & Scientific
Methods, vol. 11, no. 12, pp. 2304-2310, 2023.
[CrossRef] [Publisher
Link]
[5] Bunny Saini et al., “A Comparative Analysis of
Indian Sign Language Recognition using Deep Learning Models,” Forum for Linguistic Studies, vol. 5,
no. 1, pp. 197-222, 2023.
[CrossRef] [Google
Scholar] [Publisher
Link]
[6] Ahmed
Mateen Buttar et al., “Deep Learning in Sign Language Recognition: A Hybrid
Approach for the Recognition of Static and Dynamic Signs,” Mathematics,
vol. 11, no. 17, pp. 1-20, 2023.
[CrossRef] [Google
Scholar] [Publisher
Link]
[7] Md Azher Uddin, Ryan Denny, and
Joolekha Bibi Joolee, “Deep Spatiotemporal Network-based Indian Sign Language
Recognition from Videos,” Lecture Notes in Networks and Systems, pp.
171-181, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[8] Hao
Chen et al., “SignVTCL: Multi-Modal Continuous Sign Language Recognition
Enhanced by Visual-Textual Contrastive Learning,” arXiv preprint, pp.
1-12, 2024.
[CrossRef] [Google
Scholar] [Publisher Link]
[9] Maher
Jebali, Abdesselem Dakhli, and Wided Bakari, “Deep Learning-based Sign Language
Recognition System using Both Manual and Non-Manual Components Fusion,” AIMS
Mathematics, vol. 9, no. 1, pp. 2105-2122, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[10] Sunusi
Bala Abdullahi et al., “Spatial-Temporal Feature-based End-to-End Fourier
Network for 3D Sign Language Recognition,” Expert Systems with Applications,
vol. 248, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[11] Liqing
Gao et al., “Cross-Modal Knowledge Distillation for Continuous Sign Language
Recognition,” Neural Networks, vol. 179, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[12] Sharvani
Srivastava et al., “Continuous Sign Language Recognition System using Deep
Learning with Mediapipe Holistic,” arXiv preprint, pp. 1-14, 2024.
[CrossRef] [Google
Scholar] [Publisher Link]
[13] Anudyuti
Ghorai et al., “Indian Sign Language Recognition System using Network
Deconvolution and Spatial Transformer Network,” Neural Computing and
Applications, vol. 35, no. 1, pp. 20889-20907, 2023.
[CrossRef] [Google
Scholar] [Publisher
Link]
[14] Zaid
Saad Bilal et al., “Advancements in Arabic Sign Language Recognition: A Method
based on Deep Learning to Improve Communication Access,” Journal of
Internet Services and Information Security, vol. 14, no. 4, pp. 278-291,
2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[15] Edwin
Shalom Soji, and T. Kamalakannan, “Efficient Indian Sign Language Recognition
and Classification using Enhanced Machine Learning Approach,” International
Journal of Critical Infrastructures, vol. 20, no. 2, pp. 125-138, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[16] Arashta
Hussain, Nimakhi Saikia, and Chandana Dev, “Advancements in Indian Sign
Language Recognition Systems: Enhancing Communication and Accessibility for the
Deaf and Hearing Impaired,” Asian Journal of Electrical Sciences, vol.
12, no. 2, pp. 37-49, 2023.
[CrossRef] [Google
Scholar] [Publisher
Link]
[17] Yanqiong
Zhang, and Xianwei Jiang, “Recent Advances on Deep Learning for Sign Language
Recognition,” Computer Modeling in Engineering & Sciences, vol.
139, no. 3, pp. 2399-2450, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[18] Fatma
M. Najib, “A Multi-Lingual Sign Language Recognition System using Machine
Learning,” Multimedia Tools and Applications, vol. 84, no. 24, pp.
27987-28011, 2024.
[CrossRef] [Google Scholar] [Publisher
Link]
[19] Matyáš
Boháček, and Marek Hrúz, “Sign Pose-based Transformer for Word-Level Sign
Language Recognition,” 2022 IEEE/CVF Winter Conference on Applications of
Computer Vision Workshops (WACVW), Waikoloa, HI, USA, pp. 182-191, 2022.
[CrossRef] [Google
Scholar] [Publisher Link]
[20] Diksha
Kumari, and Radhey Shyam Anand, “Isolated Video-based Sign Language Recognition
using a Hybrid CNN-LSTM Framework based on Attention Mechanism,” Electronics,
vol. 13, no. 7, pp. 1-13, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[21] Weichao
Zhao et al., “Self-Supervised Representation Learning with Spatial-Temporal
Consistency for Sign Language Recognition,” IEEE Transactions on Image
Processing, vol. 33, pp. 4188-4201, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[22] Pranav
Chaudhari et al., “Sign Language Recognition using Spiking Neural Networks,” Procedia Computer Science, vol. 235, pp.
2674-2683, 2024.
[CrossRef] [Google
Scholar] [Publisher
Link]
[23] Vaishnavi Sonawane, Indian Sign Language Dataset, 2020. [Online].
Available: https://www.kaggle.com/datasets/vaishnaviasonawane/indian-sign-language-dataset
[24] UniSerj · Community Prediction Competition, Sign Language Recognition, 2026. [Online]. Available: https://www.kaggle.com/competitions/sign-language-recognition