A Hybrid Deep Learning Framework for Indian Sign Language Recognition from Images and Videos

S. Jayalakshmi; S.P. Balamurugan

doi:https://doi.org/10.14445/22315381/IJETT-V74I4P114

Research Article | Open Access | Download PDF

Volume 74 | Issue 4 | Year 2026 | Article Id. IJETT-V74I4P114 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I4P114

A Hybrid Deep Learning Framework for Indian Sign Language Recognition from Images and Videos

S. Jayalakshmi, S.P. Balamurugan

Received	Revised	Accepted	Published
23 Jun 2025	11 Feb 2026	28 Feb 2026	29 Apr 2026

Citation :

S. Jayalakshmi, S.P. Balamurugan, "A Hybrid Deep Learning Framework for Indian Sign Language Recognition from Images and Videos," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 4, pp. 180-195, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I4P114

Abstract

This paper proposes a comprehensive deep learning approach for Indian Sign Language (ISL) recognition based on static image and dynamic video modalities. Based on a common use of spatial and temporal complexities in sign language gestures, this paper proposes two types of specialist hybrids. For static gestures, a CNN-Transformer model is introduced, which effectively combines multiple types of convolutional networks with Transformer models to capture the image features depending on global contextual relationships. A CNN-LSTM model based on the ConvLSTM2D layer is then used to jointly learn both spatial and temporal relations between sequences of video frames, responsible for generating dynamic sign sequences. Full-fledged repository containing the data, which is a balanced set of 36 ISL signs for each modality, along with uniform preprocessing steps like grayscale normalization, Gaussian smoothing, resizing, and sequence padding to maintain homogeneity in inputs. To guarantee computational efficiency while preserving gesture clarity, frames are temporally sampled in the video pipeline. Dynamic class weighting is applied during training to balance the classes, while early stopping and learning rate schedulers ensure that convergence is optimal without leading to overfitting. The models are tested and trained, and five-fold cross-validation is used for statistical strength and generalization. These metrics are calculated, and furthermore, the model's performance is plotted using confusion matrices, ROC curves, and accuracy-loss curves. The CNN-Transformer model has excellent image-based classification performance, and the CNN-LSTM model has very good motion-based temporal features capturing capability. This dual-modality system shows a successful merger of spatial and spatiotemporal deep neural network architectures for real-time, accurate ISL recognition. This system can easily be incorporated into assistive communication tools (such as sign-to-speech translators), educational resources, and mobile applications, thus facilitating inclusion for the hearing and speech-impaired communities.

Keywords

Indian Sign Language recognition, CNN-Transformer model, CNN-LSTM Model, Static Image, Dynamic Video, Spatial and Temporal Features, Five-Fold Cross-Validation.

References

[1] Harsh Kumar Vashisth et al., “Hand Gesture Recognition in Indian Sign Language using Deep Learning,” Engineering Proceedings, vol. 59, no. 1, pp. 1-11, 2023.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[2] Prachi Pramod Waghmare et al., “Deep Learning Approach for Combined Indian Sign Language Recognition and Video Generation Model,” International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 4, pp. 3296-3302, 2024.
[Publisher Link ]

[3] S. Nithyanandh, “AI-Driven Indian Sign Language Recognition using Hybrid CNN-BiLSTM Architecture for Divyangjan,” International Journal of Emerging Science and Engineering (IJESE), vol. 14, no. 1, pp. 14-25, 2025.
[CrossRef ] [Publisher Link ]

[4] Jay Joshi, and Dhaval Patel, “Transformer-based Deep Learning Approach for Indian Sign Language Recognition,” International Journal of All Research Education & Scientific Methods, vol. 11, no. 12, pp. 2304-2310, 2023.
[CrossRef ] [Publisher Link ]

[5] Bunny Saini et al., “A Comparative Analysis of Indian Sign Language Recognition using Deep Learning Models,” Forum for Linguistic Studies, vol. 5, no. 1, pp. 197-222, 2023.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[6] Ahmed Mateen Buttar et al., “Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs,” Mathematics, vol. 11, no. 17, pp. 1-20, 2023.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[7] Md Azher Uddin, Ryan Denny, and Joolekha Bibi Joolee, “Deep Spatiotemporal Network-based Indian Sign Language Recognition from Videos,” Lecture Notes in Networks and Systems, pp. 171-181, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[8] Hao Chen et al., “SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning,” arXiv preprint, pp. 1-12, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[9] Maher Jebali, Abdesselem Dakhli, and Wided Bakari, “Deep Learning-based Sign Language Recognition System using Both Manual and Non-Manual Components Fusion,” AIMS Mathematics, vol. 9, no. 1, pp. 2105-2122, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[10] Sunusi Bala Abdullahi et al., “Spatial-Temporal Feature-based End-to-End Fourier Network for 3D Sign Language Recognition,” Expert Systems with Applications, vol. 248, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[11] Liqing Gao et al., “Cross-Modal Knowledge Distillation for Continuous Sign Language Recognition,” Neural Networks, vol. 179, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[12] Sharvani Srivastava et al., “Continuous Sign Language Recognition System using Deep Learning with Mediapipe Holistic,” arXiv preprint, pp. 1-14, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[13] Anudyuti Ghorai et al., “Indian Sign Language Recognition System using Network Deconvolution and Spatial Transformer Network,” Neural Computing and Applications, vol. 35, no. 1, pp. 20889-20907, 2023.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[14] Zaid Saad Bilal et al., “Advancements in Arabic Sign Language Recognition: A Method based on Deep Learning to Improve Communication Access,” Journal of Internet Services and Information Security, vol. 14, no. 4, pp. 278-291, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[15] Edwin Shalom Soji, and T. Kamalakannan, “Efficient Indian Sign Language Recognition and Classification using Enhanced Machine Learning Approach,” International Journal of Critical Infrastructures, vol. 20, no. 2, pp. 125-138, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[16] Arashta Hussain, Nimakhi Saikia, and Chandana Dev, “Advancements in Indian Sign Language Recognition Systems: Enhancing Communication and Accessibility for the Deaf and Hearing Impaired,” Asian Journal of Electrical Sciences, vol. 12, no. 2, pp. 37-49, 2023.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[17] Yanqiong Zhang, and Xianwei Jiang, “Recent Advances on Deep Learning for Sign Language Recognition,” Computer Modeling in Engineering & Sciences, vol. 139, no. 3, pp. 2399-2450, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[18] Fatma M. Najib, “A Multi-Lingual Sign Language Recognition System using Machine Learning,” Multimedia Tools and Applications, vol. 84, no. 24, pp. 27987-28011, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[19] Matyáš Boháček, and Marek Hrúz, “Sign Pose-based Transformer for Word-Level Sign Language Recognition,” 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, pp. 182-191, 2022.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[20] Diksha Kumari, and Radhey Shyam Anand, “Isolated Video-based Sign Language Recognition using a Hybrid CNN-LSTM Framework based on Attention Mechanism,” Electronics, vol. 13, no. 7, pp. 1-13, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[21] Weichao Zhao et al., “Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition,” IEEE Transactions on Image Processing, vol. 33, pp. 4188-4201, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[22] Pranav Chaudhari et al., “Sign Language Recognition using Spiking Neural Networks,” Procedia Computer Science, vol. 235, pp. 2674-2683, 2024.
[CrossRef ] [Google Scholar ] [Publisher Link ]

[23] Vaishnavi Sonawane, Indian Sign Language Dataset, 2020. [Online]. Available: https://www.kaggle.com/datasets/vaishnaviasonawane/indian-sign-language-dataset

[24] UniSerj · Community Prediction Competition, Sign Language Recognition, 2026. [Online]. Available: https://www.kaggle.com/competitions/sign-language-recognition