Multimodal Feature-Based Deep Learning Framework for Person Re-Identification: Enhancing Models with InceptionNet Representation

Multimodal Feature-Based Deep Learning Framework for Person Re-Identification: Enhancing Models with InceptionNet Representation

  IJETT-book-cover           
  
© 2025 by IJETT Journal
Volume-73 Issue-7
Year of Publication : 2025
Author : Badireddygari Anurag Reddy, Danvir Mandal, Bhaveshkumar C. Dharmani
DOI : 10.14445/22315381/IJETT-V73I7P105

How to Cite?
Badireddygari Anurag Reddy, Danvir Mandal, Bhaveshkumar C. Dharmani, "Multimodal Feature-Based Deep Learning Framework for Person Re-Identification: Enhancing Models with InceptionNet Representation," International Journal of Engineering Trends and Technology, vol. 73, no. 7, pp.34-51, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I7P105

Abstract
In security, surveillance, and identity verification systems, person re-identification, or Re-ID, has become a vital task in the field of computer vision since its introduction. Conventional methods sometimes lead one to run across issues, including changing viewpoints, shadows, and various lighting conditions. Recent advances in deep learning, allowing the use of multimodal data and robust feature extraction techniques, have produced interesting ideas. In this work, a deep learning-based method for person re-identification is investigated using the Deep Multimodal Inception Network Representation Learning (DMIRL) framework. Review of pre-existing Re-ID algorithms on widely used datasets, including DukeMTMC-reID and Market-1501, comes first in the process. Various approaches to data preparation are used to improve the datasets. These methods consist of augmentation, image normalisation, and multimodal feature extraction. An advanced InceptionNet architecture capable of learning complementary features from multimodal inputs is used in the proposed DMIRL model. Among these inputs are optical, infrared, and skeletal ones. Experimental analyses reveal that DMIRL successfully addresses pose variations and partial occlusions. The proposed method achieved 89.5% accuracy on MSMT17, 92.4% on Market1501, 91.1% on DukeMTMC, and 85.3% on CUHK03-NP. Mean Average Precision (mAP) reached 87.6% on Market1501. Computational efficiency ranged from 0.30s to 0.40s. Cross-modality (RGB to IR) showed a slight decline, maintaining 85.0% accuracy on MSMT17.

Keywords
Deep learning, InceptionNet, Multimodal features, Person re-identification, Representation learning.

References
[1] He Li et al., “All in One Framework for Multimodal Re-Identification in the Wild,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 17459-17469, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[2] R. Manikandan et al., “Sequential Pattern Mining on Chemical Bonding Database in the Bioinformatics Field,” AIP Conference Proceedings, Krishnagiri, India, vol. 2393, no. 1, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Hamza Mukhtar, and Muhammad Usman Ghani Khan, “CMOT: A Cross-Modality Transformer for RGB-D Fusion in Person Re-Identification with Online Learning Capabilities,” Knowledge-Based Systems, vol. 283, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[4] S.S. Sivasankari et al., “Classification of Diabetes using Multilayer Perceptron,” 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, pp. 1-5, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Yuvaraj Natarajan, Srihari Kannan, and Sachi Nandan Mohanty, “Survey of Various Statistical Numerical and Machine Learning Ontological Models on Infectious Disease Ontology,” Data Analytics in Bioinformatics: A Machine Learning Perspective, pp. 431-442, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Keyang Cheng et al., “BAMG: Text-Based Person Re-Identification via Bottlenecks Attention and Masked Graph Modeling,” Proceedings of the Asian Conference on Computer Vision, pp. 1809-1826, 2024.
[Google Scholar] [Publisher Link]
[7] Radjarejesri Shesayar et al., “Nanoscale Molecular Reactions in Microbiological Medicines in Modern Medical Applications,” Green Processing and Synthesis, vol. 12, no. 1, pp. 1-8, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Gaurav Dhiman et al., “Multi-Modal Active Learning with Deep Reinforcement Learning for Target Feature Extraction in Multi-Media Image Processing Applications,” Multimedia Tools and Applications, vol. 82, no. 4, pp. 5343-5367, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Di Wu et al., “LRMM: Low Rank Multi-Scale Multi-Modal Fusion for Person Re-Identification based on RGB-NI-TI,” Expert Systems with Applications, vol. 263, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Xinyu Zhang, Peng Zhang, and Caifeng Shan, “Corruption-Invariant Person Re-Identification via Coarse-to-Fine Feature Alignment,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 2, pp. 1084-1097, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Nianchang Huang et al., “Cross-Modality Person Re-Identification via Multi-Task Learning,” Pattern Recognition, vol. 128, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Samee Ullah Khan et al., “Visual Appearance and Soft Biometrics Fusion for Person Re-Identification using Deep Learning,” IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 3, pp. 575-586, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Cuiqun Chen, Mang Ye, and Ding Jiang, “Towards Modality-Agnostic Person Re-Identification with Descriptive Query,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, pp. 15128-15137, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Nianchang Huang et al., “Exploring Modality-Shared Appearance Features and Modality-Invariant Relation Features for Cross-Modality Person Re-Identification,” Pattern Recognition, vol. 135, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Jianan Liu, Jian Liu, and Qiang Zhang, “M2FINet: Modality-Specific and Modality-Shared Features Interaction Network for RGB-IR Person Re-Identification,” Computer Vision and Image Understanding, vol. 232, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Guang Han et al., “Text-to-Image Person Re-identification Based on Multimodal Graph Convolutional Network,” IEEE Transactions on Multimedia, vol. 26, pp. 6025-6036, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Suncheng Xiang et al., “Deep Multimodal Representation Learning for Generalizable Person Re-Identification,” Machine Learning, vol. 113, no. 4, pp. 1921-1939, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Xiangtian Zheng et al., “Multi-Modal Person Re-Identification based on Transformer Relational Regularization,” Information Fusion, vol. 103, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Cosimo Patruno et al., “Multimodal People Re-identification using 3D Skeleton, Depth and Color Information,” IEEE Access, vol. 12, pp. 174689-174704, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Meng Zhang, Rujie Liu, and Abe Narishige, “Face Helps Person Re-Identification: Multi-modality Person Re-Identification Based on Vision-Language Models,” 2024 IEEE International Joint Conference on Biometrics (IJCB), Buffalo, NY, USA, pp. 1-10, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Yongkang Ding et al., “Attention-Enhanced Multimodal Feature Fusion Network for Clothes-Changing Person Re-Identification,” Complex & Intelligent Systems, vol. 11, no. 1, pp. 1-15, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Qingshan Chen et al., “MSIF: Multi-Spectrum Image Fusion Method for Cross-Modality Person Re-Identification,” International Journal of Machine Learning and Cybernetics, vol. 15, no. 2, pp. 647-665, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Jiaxuan Li et al., “Multimodal Feature Hierarchical Fusion for Text-Image Person Re-Identification,” In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, pp. 468-481, 2024.
[CrossRef] [Google Scholar] [Publisher Link]