Image Capturing and Deleting Duplicate Images through Feature Extraction using Hashing Techniques

Image Capturing and Deleting Duplicate Images through Feature Extraction using Hashing Techniques

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-1
Year of Publication : 2024
Author : Prathima Ch, R. Swathi, K. Suneetha, I. Suneetha, B. V. Suresh Reddy, Siva Kumar Depuru
DOI : 10.14445/22315381/IJETT-V72I1P107

How to Cite?

Prathima Ch, R. Swathi, K. Suneetha, I. Suneetha, B. V. Suresh Reddy, Siva Kumar Depuru, "Image Capturing and Deleting Duplicate Images through Feature Extraction using Hashing Techniques," International Journal of Engineering Trends and Technology, vol. 72, no. 1, pp. 64-70, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I1P107

Abstract
Nowadays, a major issue in society is the duplication of all the objects in terms of images while capturing. For example, if a folder is on the PC, it allows the storage of the same image with different names. Here, the waste of memory is more. So, to avoid duplication of images, needed to scan the content inside the file and identify whether the images are duplicates or not. This process is known as CBIR (Content Based Image Retrieval). There are many techniques to find duplicate images. CNN (Convolution Neural Networks), Phash (perceptual hash), Block Truncation Technique etc., are used to find the image similarity identification and to avoid wastage of memory. The proposed work is to capture images and store them in a folder to identify duplicate images and delete them. Developed a user interface to delete duplicate images using a folder path for user-friendly usage. There are various techniques for image deduplication and all the techniques will not work effectively. There is a need for image deduplication in a wide range of domains like income tax, banks, and many other private organizations and private corporations where there store many images that are duplicates. Even some of the social media companies store images that are shared using the app. All these need to be deduplicated to save money, maintenance costs and storage space.

Keywords
Images, Convolutional Neural Network, Deep Learning, Deduplication.

References
[1] S. Bhattacharjee, and M. Kutter, “Compression Tolerant Image Authentication,” Proceedings of International Conference on Image Processing, ICIP98 (Cat. No.98CB36269), Chicago, IL, USA, vol. 1, pp. 435-439, 1998.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Lu Chen, Feng Xiang, and Zhixin Sun, “Image Deduplication Based on Hashing and Clustering in Cloud Storage,” KSII Transactions on Internet and Information Systems, vol. 15, no. 4, pp. 1448-1463, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] R. Venkatesan et al., “Robust Image Hashing,” Proceedings of International Conference on Image Processing, (Cat. No.00CH37101), Vancouver, BC, Canada, vol. 3, pp. 664-666, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ling Du et al., “An Image Hashing Algorithm for Authentication with Multi-Attack Reference Generation and Adaptive Thresholding,” Algorithms, vol. 13, no. 9, pp. 1-17, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Chun-Shien Lu, and H.Y.M. Liao, “Structural Digital Signature for Image Authentication: An Incidental Distortion Resistant Scheme,” IEEE Transactions on Multimedia, vol. 5, no. 2, pp. 161-173, 2003.
[CrossRef] [Google Scholar] [Publisher Link]
[6] S.S. Kozat, R. Venkatesan, and M.K. Mihcak, “Robust Perceptual Image Hashing via Matrix Invariants,” International Conference on Image Processing, ICIP '04., Singapore, vol. 5, pp. 3443-3446, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[7] V. Monga, A. Banerjee, and B.L. Evans, “Clustering Algorithms for Perceptual Image Hashing,” 3 rd IEEE Signal Processing Education Workshop, IEEE 11th Digital Signal Processing Workshop, Taos, NM, USA, pp. 283-287, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Jian Shen et al., “Block Design-Based Key Agreement for Group Data Sharing in Cloud Computing,” IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 6, pp. 996-1010, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Wen Xia et al., “A Comprehensive Study of the Past, Present, and Future of Data Deduplication,” Proceedings of the IEEE, vol. 104, no. 9, pp. 1681-1710, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Yinjin Fu et al., “Application-Aware Big Data Deduplication in Cloud Environment,” IEEE Transactions on Cloud Computing, vol. 7, no. 4, pp. 921-934, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart, “Message-Locked Encryption and Secure Deduplication,” Annual International Conference on the Theory and Applications of Cryptographic Techniques, vol. 7881, pp. 296-312, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[12] H.S. Gunawi et al., “Deconstructing Commodity Storage Clusters,” Proceedings of the 32nd International Symposium on Computer Architecture, pp. 60-71, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[13] William J. Bolosky et al., “Single instance storage in Windows® 2000,” Proceedings of the 4th Conference on Usenix Windows Systems Symposium, pp. 13-24, 2000.
[Google Scholar] [Publisher Link]
[14] Tim D. Moreton, Ian A. Pratt, and Timothy L. Harris, “Storage, Mutability and Naming in Pasta,” Proceedings of International Conference on Research in Networking, vol. 2376, pp. 215-219, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[15] L.L. You, K.T. Pollack, and D.D.E. Long, “Deep Store: An Archival Storage System Architecture,” 21st International Conference on Data Engineering, Tokyo, Japan, pp. 804-815, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Athicha Muthitacharoen, Benjie Chen, and David Mazières, “A Low-Bandwidth Network File System,” Proceedings of the 18th ACM Symposium on Operating Systems Review, vol. 35, no. 5, pp. 174-187, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Lei Zhang, and Jun Ma, “Image Annotation by Incorporating Word Correlations into Multi-Class SVM,” Fifth International Conference on Natural Computation, Tianjian, China, pp. 516-520, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Chenggang Yan et al., “Deep Multi-View Enhancement Hashing for Image Retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1445-1451, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Chenggang Yan et al., “3D Room Layout Estimation from a Single RGB Image,” IEEE Transactions on Multimedia, vol. 22, no. 11, pp. 3014-3024, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Chenggang Yan et al., “Depth Image Denoising Using Nuclear Norm and Learning Graph Model,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 16, no. 4, pp. 1-17, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Subiman Chatterjee, and Kishor Sarawadekar, “An Optimized Architecture of HEVC Core Transform Using Real-Valued DCT Coefficients,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 12, pp. 2052-2056, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Ch. Prathima, and L.S.S. Reddy, “A Survey on Efficient Data Deduplication in Data Analytics,” Soft Computing and Medical Bioinformatics, Springer Briefs in Applied Sciences and Technology, Springer, Singapore, pp. 103-113, 2019.
[CrossRef] [Google Scholar] [Publisher Link]