Multimodal Analytical Approach for Determination of Deduplication in Names for Identifying People during Emergency Situations

Nagesh Raykar; Prema Sahane; Sonali Rangdale; Dipmala Salunke; Pallavi Tekade; Pramod Patil

doi:https://doi.org/10.14445/22315381/IJETT-V73I8P101

Research Article | Open Access | Download PDF

Volume 73 | Issue 8 | Year 2025 | Article Id. IJETT-V73I8P101 | DOI : https://doi.org/10.14445/22315381/IJETT-V73I8P101

Multimodal Analytical Approach for Determination of Deduplication in Names for Identifying People during Emergency Situations

Nagesh Raykar, Prema Sahane, Sonali Rangdale, Dipmala Salunke, Pallavi Tekade, Pramod Patil

Received	Revised	Accepted	Published
27 Jan 2025	07 Jul 2025	21 Jul 2025	30 Aug 2025

Citation :

Nagesh Raykar, Prema Sahane, Sonali Rangdale, Dipmala Salunke, Pallavi Tekade, Pramod Patil, "Multimodal Analytical Approach for Determination of Deduplication in Names for Identifying People during Emergency Situations," International Journal of Engineering Trends and Technology (IJETT), vol. 73, no. 8, pp. 1-13, 2025. Crossref, https://doi.org/10.14445/22315381/IJETT-V73I8P101

Abstract

Phonetic algorithms are developed to index words based on their pronunciation and are primarily developed for the English language. Demographic Data (DD) gives information about people according to certain attributes like name, age, gender, residence, occupation, etc. In hospitals, government record matching, and multilingual information retrieval systems during emergency situations, it becomes vital to quickly and accurately identify a person, and many times, confusion is created due to duplication in records. The records do not fetch the names if their alphabetical order is incorrect while writing the same names. Phonetic name identification also provides important statistics in web analysis. Though there are many existing studies handling the DD, no specific study has been done to deal with the Indian regional language. The proposed research compares the conventional regional names of format First Name (FN) and Last Name (LN) based on the phonetic rule. Research proposed a novel, efficient phonetic-based algorithm for the regional language. Attempts have been made to prevent name repetition and similar names, even with different alphabetical arrangements. There are emergency situations, especially while finding a next of kin, finding a person during national security issues, or in any emergency situation when not much information is available, but locating a person or informing the family about the situation is important. Many times, the person’s information is available, but the database does not fetch it if there is a dissimilarity in the spellings of the names. This research is trying to apply multimodal approaches to combine NLP and machine learning approaches for identifying people during emergency situations. Results from the suggested approach are promising, and for a real-time environment, it can be applied.

Keywords

Demographic, Indexing, Indian regional languages, Machine Learning, Natural Language Processing (NLP), Phonetic algorithm.

References

[1] Donald Treiman, Yao Lu, and Yaqiang Qi, “New Approaches to Demographic Data Collection,” Chinese Sociological Review, vol. 44, no. 3, pp. 56-92, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Krishnanjan Bhattacharjee et al., “A Novel Approach of Deduplication on Indian Demographic Variation for Large Structured Data,” Intelligent Sustainable Systems, Selected Papers of WorldS4 2021, vol. 2, pp. 345-355, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Vandna Dixit Kaushik et al., “Certain Reduction Rules Useful for De-Duplication Algorithm of Indian Demographic Data,” 2014 Fourth International Conference on Advanced Computing & Communication Technologies, Rohtak, India, pp. 79-84, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Walter Santos et al., “A Scalable Parallel Deduplication Algorithm,” 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07), Gramado, Brazil, pp. 79-86, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Manogar Ellappan, and S. Abirami, “A Study on Data Deduplication Techniques for Optimized Storage,” 2014 Sixth International Conference on Advanced Computing (ICoAC), Chennai, India, pp. 161-166, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Io Blair-Freese, “Geo-Referenced Infrastructure and Demographic Data for Development,” 2019 IEEE Global Humanitarian Technology Conference (GHTC), Seattle, WA, USA, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Auntré D. Hamp et al., “Enhancing the ATra Black Box Matching Algorithm: Use of All Names for Deduplication Across Jurisdictions,” Public Health Reports, vol. 138, no. 1, pp. 54-61. 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Olugbenga Iyiola, and Monika Akbar, “Demographic Data-Driven Deprivation Index for Predicting Chronic Diseases,” 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, pp. 4277-4286, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Manan Chawda, Rutuja Rane, and Srikanth Giri, “Demographic Progress Analysis of Census Data Using Data Mining,” 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, pp. 1894-1897, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Jennifer Ferreira, Demographic Data, Encyclopedia of Big Data, pp. 1-4, Springer, Cham, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Zhiyuan Tang et al., “Phonetic Temporal Neural Model for Language Identification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 134-144, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Yi Wang, et al., “Deep Learning-Based Socio-Demographic Information Identification From Smart Meter Data,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2593-2602, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Ivan Amón, Francisco Moreno, and Jaime Echeverri, “Phonetic Algorithm to Detect Duplicate Text Strings in Spanish,” Engineering Magazine, University of Medellín, vol. 11, no. 20, pp.127-138, 2012.
[Google Scholar]
[14] Aditya Jain, Gandhar Kulkarni, and Vraj Shah, “Natural Language Processing,” International Journal of Computer Sciences and Engineering, vol. 6, no. 1, pp. 161-167, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Qinlu He, Zhanhuai Li, and Xiao Zhang, “Data Deduplication Techniques,” 2010 International Conference on Future Information Technology and Management Engineering, Changzhou, pp. 430-433, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Tirapathi Reddy Burramukku, U. Ramya, and M.V.P. Chandra Sekhar, “A Comparative Study on Data Deduplication Techniques in Cloud Storage,” International Journal of Pharmacy & Technology, vol. 8, no. 3, pp. 18521-18530, 2016.
[Google Scholar]
[17] Zhengbing Hu, Volodymyr Leonidovich Buriachok, and Volodymyr Sokolov, “Deduplication Method for Ukrainian Last Names, Medicinal Names, and Toponyms Based on Metaphone Phonetic Algorithm,” Advances in Computer Science for Engineering and Education III, vol. 1247, pp. 518-533, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Vandana Dixit Kaushik et al., “An Efficient Algorithm for De-Duplication of Demographic Data,” Intelligent Computing Technology, 8th International Conference, Huangshan, China, vol. 7389, pp. 602-609, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Ahmed Elmagarmid, Panos Ipeirotis, and Vassilios S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 1-16, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Peter Christen, “A Comparison of Personal Name Matching: Techniques and Practical Issues,” Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), Hong Kong, China, pp. 290-294, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Sigrid Norris, Jesse Pirini, and Tui Matelau, Multimodal Analysis, The Palgrave Handbook of Applied Linguistics Research Methodology, Palgrave Macmillan, London, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Raymond Thomas, Charles Manful, and Thu Pham, “A Multimodal Analytical Method to Simultaneously Determine Monoacetyldiacylglycerols, Medium and Long Chain Triglycerides in Biological Samples during Routine Lipidomics,” ResearchSquare, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Jiansheng Wei, Junhua Zhu, and Yong Li, “Multimodal Content Defined Chunking for Data Deduplication,” FAST'14: Proceedings of the 12th USENIX conference on File and Storage Technologies, pp. 1-2, 2014.
[Google Scholar]
[24] Kay O'Halloran, Gautam Pal, and Minhao Jin, “Multimodal Approach to Analysing Big Social and News Media Data,” Discourse, Context & Media, vol. 40, pp. 1-32, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[25] D. Holmes, and M.C. McCabe, “Improving Precision and Recall for Soundex Retrieval,” Proceedings of the0 International Conference on Information Technology: Coding and Computing, Las Vegas, NV, USA, pp. 22-26, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Matthew E. Peters et al., “Deep Contextualized Word Representations,” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, vol. 1, pp. 2227-2237, 2018.
[CrossRef] [Publisher Link]
[27] David Pinto et al., “The Soundex Phonetic Algorithm Revisited for SMS Text Representation,” International Conference on Text, Speech and Dialogue, pp. 47-55, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Rima Shah, and Dheeraj Kumar Singh, “Improvement of Soundex Algorithm for Indian Language Based on Phonetic Matching,” International Journal of Computer Science, Engineering and Applications, vol. 4, no. 3, pp. 31-39, 2014.
[Google Scholar] [Publisher Link]
[29] B.S Harish, and R. Kasturi Rangan, “A Comprehensive Survey on Indian Regional Language Processing,” SN Applied Sciences, vol. 2, no. 7, pp. 1-16, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Laura Conde-Canencia, and Belaid Hamoum, “Deduplication Algorithms and Models for Efficient Data Storage,” 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), Chania, Greece, pp. 23-28, 2020.
[CrossRef] [Google Scholar] [Publisher Link]