Performing Uni-variate Analysis on Cancer Gene Mutation Data Using SGD Optimized Logistic Regression

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2021 by IJETT Journal
Volume-69 Issue-2
Year of Publication : 2021
Authors : Ashok Reddy Kandula, Dr. R. Sathya, Dr. S. Narayana
DOI :  10.14445/22315381/IJETT-V69I2P209


MLA Style: Ashok Reddy Kandula, Dr. R. Sathya, Dr. S. Narayana "Performing Uni-variate Analysis on Cancer Gene Mutation Data Using SGD Optimized Logistic Regression" International Journal of Engineering Trends and Technology 69.2(2021):59-67. 

APA Style:Ashok Reddy Kandula, Dr. R. Sathya, Dr. S. Narayana. Performing Uni-variate Analysis on Cancer Gene Mutation Data Using SGD Optimized Logistic Regression. International Journal of Engineering Trends and Technology, 69(2), 59-67.

There exists a problem in selecting the appropriate machine learning model for any given domain-specific data. Still, researchers are having issues over the model selection in solving the business problem. Along with model selection issues, researchers also face problems in the dataset. Provided all features separating important features and unimportant features in predicting the target class is a challenging task. This paper resolves these issues by using univariate data analysis through machine learning classification techniques as a basic analysis in the process of learning about the data. The objective of the paper is to perform a multi-class classification technique on different classes of mutation effects for the discussed genes. An advanced machine learning-based univariate analysis is performed on each dependent feature to get information about the data. In this paper, we proposed an optimized logistic regression technique using a stochastic gradient optimizer to perform the prediction of target classes. The model prediction is evaluated with a multiclass log loss metric.

[1] Collignon, P., Beggs, J. J., Walsh, T. R., Gandra, S., & Laxminarayan, R., Anthropological and socioeconomic factors are contributing to global antimicrobial resistance: a univariate and multivariable analysis. The Lancet Planetary Health, 2(9)(2018) e398-e405.
[2] Beanland, K., Roberts, J. W., & Stevenson, C., Modifications of Thomae`s function and differentiability. The American Mathematical Monthly, 116(6)(2009) 531-535.
[3] Austin, P. C., & Merlo, J., Intermediate and advanced topics in multilevel logistic regression analysis. Statistics in medicine, 36(20)(2017) 3257-3277.
[4] Dziugaite, G. K., & Roy, D. M., Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy., (2018).
[5] Gonugondla, S. K., Kang, M., & Shanbhag, N. R., A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE Journal of Solid-State Circuits, 53(11)(2018) 3163-3173.
[6] Kabir, F., Siddique, ‘l;” stochastic gradient descent (sgd) classifier. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP) (2015) 1-4. IEEE.
[7] Shang, F., Zhou, K., Liu, H., Cheng, J., Tsang, I. W., Zhang, L., ... & Jiao, L., VR-SGD: A simple stochastic variance reduction method for machine learning. IEEE Transactions on Knowledge and Data Engineering, 32(1)(2018) 188-202.
[8] Cui, G., Guo, J., Fan, Y., Lan, Y., & Cheng, X., Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends. IEEE Access, 7, (2019) 156848-156859.
[9] Stewart, D. J., & Batist, G., Redefining cancer: a new paradigm for better and faster treatment innovation. Journal of Population Therapeutics and Clinical Pharmacology, 21(1) (2014).
[10] Li, T., Liu, L., Kyrillidis, A., & Caramanis, C., Statistical inference using SGD. arXiv preprint arXiv:1705.07477. (2017).
[11] Do, T. N., & Poulet, F., Parallel multiclass logistic regression for classifying large-scale image datasets. In Advanced Computational Methods for Knowledge Engineering ., (2015) 255-266. Springer, Cham.
[12] Jurka, T. P., MAXENT: an R package for low-memory multinomial logistic regression with support for semi-automated text classification. The R Journal, 4(1)(2012) 56-59.
[13] Yang, K., Fan, T., Chen, T., Shi, Y., & Yang, Q., A quasi-newton method-based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513. (2019).
[14] Cutkosky, A., & Busa-Fekete, R.. Distributed stochastic optimization via adaptive SGD. In Advances in Neural Information Processing Systems (2018) 1910-1919.
[15] Phuong, T. T., Distributed SGD With Flexible Gradient Compression. IEEE Access, 8 (2020) 64707-64717.
[16] Manogaran, G., & Lopez, D., Health data analytics using scalable logistic regression with stochastic gradient descent. International Journal of Advanced Intelligence Paradigms, 10(2018) (1-2), 118-132.
[17] Hoang, N. D., Automatic detection of asphalt pavement raveling using image texture-based feature extraction and stochastic gradient descent logistic regression. Automation in Construction, 105(2019) 102843.
[18] Hong, H., Pradhan, B., Sameen, M. I., Chen, W., & Xu, C., Spatial prediction of rotational landslide using geographically weighted regression, logistic regression, and support vector machine models in Xing Guo area (China). Geomatics, Natural Hazards, and Risk, 8(2)(2017) 1997-2022.
[19] Kang, D., Lim, W., Shin, K., Sael, L., & Kang, U., Data/feature distributed stochastic coordinate descent for logistic regression. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014) 1269-1278.
[20] Huang, S., Cai, N., Pacheco, P. P., Narrandes, S., Wang, Y., & Xu, W., Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics-Proteomics, 15(1)(2018) 41-51.
[21] Xu, J., Yang, P., Xue, S., Sharma, B., Sanchez-Martin, M., Wang, F. & Parikh, B., Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges, and future perspectives. Human Genetics, 138(2) (2019) 109-124.
[22] Adlung, L., Elinav, E., Greten, T. F., & Korangy, F., Microbiome genomics for cancer prediction. Nature Cancer, 1(4)(2020) 379-381.
[23] Emmert-Streib, F., & Dehmer, M.. A machine learning perspective on Personalized Medicine: an automized, comprehensive knowledge base with ontology for pattern recognition. Machine Learning and Knowledge Extraction, 1(1)(2019) 149-156.
[24] Srinivasa Reddy, K., Suneela, B., Inthiyaz, S., Kumar, G.N.S., Mallikarjuna Reddy, A., Texture filtration module under stabilization via random forest optimization methodology.International Journal of Advanced Trends in Computer Science and Engineering, 8(3)(2019) 458-469.
[25] A.Mallikarjuna, B. Karuna Sree., Security towards Flooding Attacks in Inter-Domain Routing Object using Ad hoc Network. International Journal of Engineering and Advanced Technology (IJEAT), 8(3)(2019) 545-547.
[26] Mallikarjuna Reddy, A., Rupa Kinnera, G., Chandrasekhara Reddy, T., Vishnu Murthy, G. Generating cancelable fingerprint template using triangular structures, Journal of Computational and Theoretical Nanoscience, 16(5)(2019) 1951-1955(5).
[27] Sharma, A., Kulshrestha, S., & Daniel, S., Machine learning approaches for breast cancer diagnosis and prognosis. In 2017 International Conference on Soft Computing and its Engineering Applications (icSoftComp) (2017) 1-5. IEEE.
[28] Ashwini S. Savanth, Dr. P.A.Vijaya ,Artificial Neural Networks for fMRI Data Analysis: A Survey, International Journal of Engineering Trends and Technology (IJETT), 49(8) 487-494 2017.
[29] Edwards, T. H., & Stoll, S., Optimal Tikhonov regularization for DEER spectroscopy. Journal of Magnetic Resonance, 288(2018) 58-68.

Univariate analysis, Prediction, Mutation changes, Logistic regression, Stochastic Gradient Descent.