An Ensemble Approach for Privacy-Preserving Record Linkage

An Ensemble Approach for Privacy-Preserving Record Linkage

© 2021 by IJETT Journal
Volume-69 Issue-9
Year of Publication : 2021
Authors : Vijay Maruti Shelake, Narendra M. Shekokar
DOI :  10.14445/22315381/IJETT-V69I9P218

How to Cite?

Vijay Maruti Shelake, Narendra M. Shekokar, "An Ensemble Approach for Privacy-Preserving Record Linkage," International Journal of Engineering Trends and Technology, vol. 69, no. 9, pp. 146-152, 2021. Crossref,

In today`s world, it is essential to collect and identify the information of the same individuals from multiple databases in a secure manner for record matching, linkage and integration. Thus, the privacy-preserving record linkage (PPRL) refers to identifying and comparing the same person`s records across multiple databases in secure manner. In this paper, the various PPRL techniques are discussed. Among the different PPRL techniques, the Bloom filter encoding is suitable for secure and approximate record matching. However, most of the hardened Bloom filter encoding techniques provide privacy while compromising linkage accuracy. Hence, the ensemble approach is suggested to provide improved linkage accuracy than existing basic, balanced, and cellular automata Bloom filter-based PPRL.

Data integration, matching, privacy, linkage, Bloom filter, similarity measures.

[1] Christen, P. Febrl – A Freely Available Record Linkage System with a Graphical User Interface. in HDKM ’08 Proceedings of the Second Australasian Workshop on Health Data and Knowledge Management, Darlinghurst, Australia: Australian Computer Society, (2008) 17–25.
[2] Christen, P., Vatsalan, D. and Verykios, V. S. A Taxonomy of Privacy-Preserving Record Linkage Techniques. In Journal of Information Systems (Elsevier), 38(6) (2013) 946-969.
[3] Bernstein, P. A., Madhavan, J. and Rahm, E. Generic Schema Matching, Ten Years Later, PVLDB, 4(11) (2011) 695-701.
[4] Christen, P., Vatsalan, D. and Verykios, V. S. Challenges for Privacy Preservation in Data Integration. ACM Journal of Data and Information Quality, 5(1-2) (2014) 1-3.
[5] Durham, E., Xue, Y., Kantarcioglu, M. and Malin, B. A. Quantifying the Correctness, Computational Complexity, and Security of Privacy- Preserving String Comparators for Record Linkage. Information Fusion, 13(4), Elsevier, (2012) 245-259.
[6] Navarro-Arribas, G. and Torra, V. Information Fusion in Data Privacy: A Survey. Information Fusion, 13(4), Elsevier, (2012) 235-244.
[7] Vatsalan, D. and Christen, P. Privacy-preserving Matching of Similar Patients. Journal of Biomedical informatics, 59, Elsevier, (2016) 285- 298.
[8] Vatsalan, D. and Christen, P. Scalable Privacy-Preserving Record Linkage for Multiple Databases. In CIKM `14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (2014) 1795–1798.
[9] Schnell, R., Bachteler,T. and Reiher, J. Privacy-Preserving Record Linkage Using Bloom filters. BMC Medical Informatics and Decision Making, 9(1) (2009).
[10] Shelake, V. M. and Shekokar, N. A Survey of Privacy Preserving Data Integration. In 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), Mysuru, (2017) 59-70.
[11] Christen, P., Schnell, R., Vatsalan D., Ranbaduge T. Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, 10234, Springer, Cham, (2017) 628-640.
[12] Christen, P., Ranbaduge, T., Vatsalan, D. and Schnell, R. Precise and Fast Cryptanalysis for Bloom Filter Based Privacy-Preserving Record Linkage. In IEEE Transactions on Knowledge and Data Engineering, 31(11) (2019) 2164-2177.
[13] Randall, S. M., Ferrante, A. M., Boyd, J. H., Bauer, J. K. and Semmens, J. B. Privacy-Preserving Record Linkage on Large Real World Datasets. Journal of Biomedical Informatics, 50, Elsevier, (2014) 205-212.
[14] Russell, R. C. US Patent No 1,261,167.,(1922).
[15] Bouzelat H, Quantin C, Dusserre L. Extraction and Anonymity Protocol of Medical File. In Proc. AMIA Fall Symposium, 1996, pp.323-327.
[16] Quantin, C., Bouzelat, H., Allaert, F.A. A, Benhamiche A-M., Faivre, J. and Dusserre, L. How to Ensure Data Security of an Epidemiological Follow-Up: Quality Assessment of an Anonymous Record Linkage Procedure. International Journal of Medical Informatics, 49(1), Elsevier, (1998) 117-22.
[17] Karakasidis A., Verykios, V. S. Privacy Preserving Record Linkage Using Phonetic Codes. In 2009 Fourth Balkan Conference in Informatics, Thessaloniki, (2009) 101-106.
[18] Karakasidis, A., Koloniari, G. Private Entity Resolution for Big Data on Apache Spark Using Multiple Phonetic Codes. Big Data Recommender Systems - Volume 1: Algorithms, Architectures, Big Data, Security and Trust, Chap. 13, IET Digital Library, (2019) 283- 301.
[19] Karakasidis A., Verykios, V. S. and Christen, P. Fake Injection Strategies for Private Phonetic Matching. In: Garcia-Alfaro J., Navarro-Arribas G., Cuppens-Boulahia N., de Capitani di Vimercati S. (eds) Data Privacy Management and Autonomous Spontaneus Security, DPM 2011, SETOP 2011, Lecture Notes in Computer Science, 7122, Berlin: Springer, (2011) 9-24
[20] Abir Bin Ayub Khan, Mohammad Sheikh Ghazanfar, Shahidul Islam Khan. Application of Phonetic Encoding for Analyzing Similarity of Patient’s Data: Bangladesh Perspective. 10 Humanitarian Technology Conference (R10-HTC), (2017) 664-667, IEEE.
[21] Koneru, K. and Varol, C. Privacy Preserving Record Linkage Using Vijay Maruti Shelake & Narendra M. Shekokar. / IJETT, 69(9), 146-152, 2021 152 MetaSoundex Algorithm. In 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, (2017) 443- 447.
[22] Brown, A. P., Borgs, C., Randall, S. M. and Schnell, R. Evaluating Privacy-Preserving Record Linkage using Cryptographic Long-term Keys and Multibit Trees on Large Medical Datasets. BMC Medical Informatics and Decision Making, 17(83) (2017).
[23] Schnell, R. Privacy Preserving Record Linkage. In Methodological Developments in Data Linkage, K. Harron, H. Goldstein, and C. Dibben, Eds. Chichester: Wiley, (2016) 201–225.
[24] Christen, P., Ranbaduge, T. and Schnell, R. Linking Sensitive Data: Methods and Techniques for Practical Privacy-Preserving Information Sharing. Springer Science and Business Media LLC, 2020.
[25] Bloom, B. H. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7) (1970) 422–426.
[26] Manning, C., Raghavan, P. and Schuetze, H. Introduction to Information Retrieval. 39. Cambridge University Press, 2009.
[27] Schnell, R. and Borgs, C. Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, (2016) 218-224.
[28] Knuth, D. E. Efficient Balanced Codes. IEEE Transactions on Information Theory, 32(1) (1986) 51–53.
[29] Berger, J. M. A note on error detection codes for asymmetric channels,” Information and Control, 4(1) (1961) 68–73.
[30] Schnell, R. and Borgs, C. XOR-Folding for Bloom Filter-based Encryptions for Privacy-preserving Record Linkage. German Record Linkage Center, NO. WP-GRLC-2016-03, SSRN, (2016).
[31] Alaggan, M., Gambs, S. and Kermarrec, A-M. BLIP: Non-interactive Differentially-private Similarity Computation on Bloom Filters. In Stabilization, Safety, and Security of Distributed Systems: 14th International Symposium, SSS 2012, Toronto, Canada, October 1–4, 2012. Proceedings, A. W. Richa and C. Scheideler, Eds. Berlin: Springer, (2012) 202–216.
[32] Schnell, R. and Borgs, C. Hardening Encrypted Patient Names Against Cryptographic Attacks Using Cellular Automata. 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, Singapore (2018) 518-522.