A Comparative Study of Using Various Machine Learning and Deep Learning-Based Fraud Detection Models For Universal Health Coverage Schemes

Rohan Yashraj Gupta; Satya Sai Mudigonda; Pallav Kumar Baruah

doi:https://doi.org/10.14445/22315381/IJETT-V69I3P216

Research Article | Open Access | Download PDF

Volume 69 | Issue 3 | Year 2021 | Article Id. IJETT-V69I3P216 | DOI : https://doi.org/10.14445/22315381/IJETT-V69I3P216

A Comparative Study of Using Various Machine Learning and Deep Learning-Based Fraud Detection Models For Universal Health Coverage Schemes

Rohan Yashraj Gupta, Satya Sai Mudigonda, Pallav Kumar Baruah

Citation :

Rohan Yashraj Gupta, Satya Sai Mudigonda, Pallav Kumar Baruah, "A Comparative Study of Using Various Machine Learning and Deep Learning-Based Fraud Detection Models For Universal Health Coverage Schemes," International Journal of Engineering Trends and Technology (IJETT), vol. 69, no. 3, pp. 96-102, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I3P216

Abstract

Fraud detection is an important area of research in the healthcare systems due to its financial consequences arising mainly from investigation costs, revenue losses, and reputational risk. To mitigate this, most of the companies adopt Machine Learning and/or Deep Learning-based fraud detection models. Efficient fraud detection models improve the performance of healthcare systems. Key challenges in building an efficient fraud detection model include

Data imbalance: skewed number of lesser fraudulent cases in comparison to the non-fraudulent cases,
Selection of classification model: use of appropriate machine learning or deep learning models to identify fraud or non-fraud cases

In this work, we have used three different data-imbalance techniques and six classification models to meet these challenges; we have also used six variants of neural network models. For this, we have used data from part of the world’s largest universal health coverage scheme called Ayushman Bharat (PM-JAY India). There were a total of 26 models that were tested as part of this study. The performance of these models was measured using various metrics such as accuracy, sensitivity, specificity, and F1-score. It was identified that a neural network model trained on undersampled data performed better than other models in this study. Code is available in the following link: https://github.com/RohanYashraj/Healthcare-Fraud-Detection

Keywords

Ayushman Bharat, PM-JAY India, Largest universal health coverage scheme, Machine learning, Deep learning, Data imbalance, Actuarial techniques, Data embedding, Classification models.

References

[1] F.-M. Liou, Y.-C. Tang, and J.-Y. Chen, Detecting hospital fraud and claim abuse through diabetic outpatient services, Health Care Manag. Sci., 11(4)(2008) 353–358, doi: 10.1007/s10729-008-9054-y.
[2] Insurance Frauds Control Act; an urgent need in India.” [Online]. Available: https://www.businesstoday.in/opinion/columns/insurance-frauds-control-act-an-urgent-need-in-india-fraudulent-claims-indian-penal-code/story/400212.html. [Accessed: 01-Mar-2021].
[3] D. Thornton, M. Brinkhuis, C. Amrit, and R. Aly, Categorizing and Describing the Types of Fraud in Healthcare, in Procedia Computer Science, 2015, doi: 10.1016/j.procs.08(2015) 594.
[4] T. K. MacKey and B. A. Liang, Combating healthcare corruption and fraud with improved global health governance, BMC Int. Health Hum. Rights, (2012), doi: 10.1186/1472-698X-12-23.
[5] K. Geldenhuys, The financial cost of Healthcare fraud, Servamus Community-based Saf. Secure. Mag., (2016).
[6] H. C. Koh and G. Tan, Data mining applications in healthcare., J. Healthc. Inf. Manag., (2005) doi: 10.4314/Jonas.v5i1.49926.
[7] P Murali Krishna and P. K. Baruah, High-Performance Kafka Powered Scalable Real-Time Rule Engine Model for Event Stream Processing, 9(2)(2018) 831–836.
[8] S. Sadiq, Y. Tao, Y. Yan, and M. L. Shyu, Mining Anomalies in Medicare Big Data Using Patient Rule Induction Method, in Proceedings - 2017 IEEE 3rd International Conference on Multimedia Big Data, BigMM (2017) doi: 10.1109/BigMM.2017.56.
[9] S. Zhou, J. He, H. Yang, D. Chen, and R. Zhang, Big Data-Driven Abnormal Behavior Detection in Healthcare Based on Association Rules, IEEE Access, (2020), doi: 10.1109/ACCESS.2020.3009006.
[10] B. B. Da Silva, Verification of business rules programs. (2014).
[11] I. Yoo et al., Data mining in healthcare and biomedicine: A survey of the literature, J. Med. Syst., (2012), doi: 10.1007/s10916-011-9710-5.
[12] P. Desikan, A. Hospitals, K. Hsu, and J. Srivastava, Data Mining for Healthcare,” Health Manag. Technol., (2000).
[13] R. Bauder, T. M. Khoshgoftaar, and N. Seliya, A survey on the state of healthcare upcoding fraud analysis and detection,” Heal. Serv. Outcomes Res. Methodol., (2017), doi: 10.1007/s10742-016-0154-8.
[14] M. House and B. Aldosari, The hazards of data mining in healthcare, in Studies in Health Technology and Informatics, (2017), doi: 10.3233/978-1-61499-781-8-80.
[15] D. Thornton, R. M. Mueller, P. Schoutsen, and J. van Hillegersberg, Predicting Healthcare Fraud in Medicaid: A Multidimensional Data Model and Analysis Techniques for Fraud Detection, Procedia Technol., 9(2013) 1252–1264, , doi: 10.1016/j.protcy.2013.12.140.
[16] M. E. Johnson and N. Nagarur, “Multi-stage methodology to detect health insurance claim fraud, Health Care Manag. Sci., ,19(3)(2016) 249–260, doi:10.1007/s10729-015-9317-3.
[17] G. van Capelleveen, M. Poel, R. M. Mueller, D. Thornton, and J. van Hillegersberg, Outlier detection in healthcare fraud: A case study in the Medicaid dental domain, Int. J. Account. Inf. Syst., 21(2016) 18–3, doi: 10.1016/j.accinf.2016.04.001.
[18] U. Srinivasan, B. Arunasalam, and B. Srinivasan Uma; Arunasalam, Leveraging big data analytics to reduce healthcare costs, IT Prof., 15(6)(2013) 21–28, doi: 10.1109/MITP.2013.55.
[19] Y. Gao, C. Sun, R. Li, Q. Li, L. Cui, and B. Gong, “An Efficient Fraud Identification Method Combining Manifold Learning and Outliers Detection in Mobile Healthcare Services,” IEEE Access, 6(2018) 60059–60068, doi: 10.1109/ACCESS.2018.2875516.
[20] H. Shin, H. Park, J. Lee, and W. C. Jhee, “A scoring model to detect abusive billing patterns in health insurance claims,” Expert Syst. Appl., vol. 39, no. 8, pp. 7441–7450, 2012, doi: 10.1016/j.eswa.2012.01.105.
[21] S.-H. Li, D. C. Yen, W.-H. Lu, and C. Wang, Identifying the signs of fraudulent accounts using data mining techniques, Comput. Human Behav., 28(3)(2012) 1002–1013, , doi: 10.1016/j.chb.2012.01.002.
[22] R. A. Bauder and T. M. Khoshgoftaar, Medicare Fraud Detection Using Machine Learning Methods, in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), (2017)(2018) 858–865, doi: 10.1109/ICMLA.2017.00-48.
[23] R. Bauder, R. da Rosa, and T. Khoshgoftaar, Identifying Medicare Provider Fraud with Unsupervised Machine Learning, in 2018 IEEE International Conference on Information Reuse and Integration (IRI), (2018) 285–292, doi: 10.1109/IRI.2018.00051.
[24] J. E. Lu Fletcher; Boritz, F. Lu, and J. Efrim Boritz, “Detecting fraud in health insurance data: Learning to model incomplete Benford`s law distributions, Learn. ECML 2005, Proc., 3720 LNAI, (2005) 633–640.
[25] R. Y. Gupta, S. Sai Mudigonda, P. K. Kandala, and P. K. Baruah, Implementation of a Predictive Model for Fraud Detection in Motor Insurance using Gradient Boosting Method and Validation with Actuarial Models, in (2019). IEEE International Conference on Clean Energy and Energy Efficient Electronics Circuit for Sustainable Development (INCCES), (2019) 1–6, doi: 10.1109/INCCES47820.2019.9167733.
[26] R. Y. Gupta, S. S. Mudigonda, P. K. Kandala, and P. K. Baruah, A Framework for Comprehensive Fraud Management using Actuarial Techniques, Int. J. Sci. Eng. Res., 10(3)(2019) 780–791.
[27] R. Y. Gupta, S. S. Mudigonda, P. K. Baruah, and P. K. Kandala, Implementation of Correlation and Regression Models for Health Insurance Fraud in Covid-19 Environment using Actuarial and Data Science Techniques, Int. J. Recent Technol. Eng., 9(3)(2020) 699–706, doi: 10.35940/ijrte.C4686.099320.
[28] R. Y. Gupta, S. S. Mudigonda, and P. K. Baruah, TGANs with Machine Learning Models in Automobile Insurance Fraud Detection and Comparative Study with Other Data Imbalance Techniques, Int. J. Recent Technol. Eng., 9(5) (2021) 236–244, doi: 10.35940/ijrte.E5277.019521.
[29] R. Y. Gupta and S. S. Mudigonda, A Proposed Model for Measuring Protection of Policyholders’ Interest at Industry Level, J. Insur. Regul. Dev. Auth. India, 17(1)(2019) 59–65.
[30] I. Ashrapov, Tabular GANs for uneven distribution. (2020).
[31] L. Xu and K. Veeramachaneni, Synthesizing Tabular Data using Generative Adversarial Networks, arXiv, (2018).
[32] N. Rai, P. K. Baruah, S. S. Mudigonda, and P. K. Kandala, Fraud Detection Supervised Machine Learning Models for an Automobile Insurance, Int. J. Sci. Eng. Res., 9(11) (2018) 473–479.
[33] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, J. Artif. Intell. Res., 16(2002) 321–357 doi: 10.1613/jair.953.
[34] N. Japkowicz and S. Stephen, The class imbalance problem: A systematic study1, Intell. Data Anal., 6(5)(2002) 429–449, doi: 10.3233/IDA-2002-6504.
[35] Tin Kam Ho, Random decision forests, in Proceedings of 3rd International Conference on Document Analysis and Recognition, 1(1995) 278–282, doi: 10.1109/ICDAR.1995.598994.
[36] J. Žižka, F. Da?ena, and A. Svoboda, Random Forest, in Text Mining with Machine Learning, First. | Boca Raton : CRC Press, 2019 CRC Press, (2019) 193–200.
[37] L. Guelman, Gradient boosting trees for auto insurance loss cost modeling and prediction, Expert Syst. Appl., 39(3)(2012) 3659–3667, doi: 10.1016/j.eswa.2011.09.058.