An Adaptive Wolf Based Dansing System for Securing Hadoop at the Data Cleaning Stage

An Adaptive Wolf Based Dansing System for Securing Hadoop at the Data Cleaning Stage

© 2022 by IJETT Journal
Volume-70 Issue-4
Year of Publication : 2022
Authors : Saritha Gattoju, Vadlamani Naga Lakshmi
DOI :  10.14445/22315381/IJETT-V70I4P204

How to Cite?

Saritha Gattoju, Vadlamani Naga Lakshmi, "An Adaptive Wolf Based Dansing System for Securing Hadoop at the Data Cleaning Stage," International Journal of Engineering Trends and Technology, vol. 70, no. 4, pp. 31-43, 2022. Crossref,

Nowadays a large amount of data is available for the association of authority using business decisions. Moreover, the collected data from various resources are too noisy, which affects the prediction results and accuracy. Hence, Data cleaning has been introduced to provide better data quality, but the main issues of data cleaning are time consumption and malicious attacks. In this paper, a novel Wolf based Wide Dansing System (WbWDS) is developed to provide security for data during the cleaning stage. Hence, the novel WbWDS is designed with four layers: logical, physical, execution, and data cleaning.
Furthermore, wolf fitness is updated to the developed framework for enhancing the security function. In addition, the involvement of wolf fitness has afforded the finest continuous monitoring results of malicious events. Additionally, the proposed WbWDS technique is implemented in Python, and an attack is launched in the cleaning layer to check the developed method`s reliability. Finally, achieved performance metrics of developed WbWDS are compared with existing methods and gained the finest results with outstanding confidential rate and low execution time.

Attacks detection, Confidentiality measure, Data cleaning, Secure Hadoop application.

[1] T. Barot, G. Srivastava, and V. Mago Determining Sufficient Volume of Data for Analysis with Statistical Framework, Trends in Artificial Intelligence Theory and Applications, Artificial Intelligence Practices, IEA/AIE 2020, Lecture Notes in Computer Science, Cham: Springer. 12144 (2020) 770-781.
[2] J. Lu, A. Hales, and D. Rew, Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics, Lecture Notes in Computer Science, Cham: Springer. 10443 (2017) 30-51.
[3] A. A. Koelmans, N. H. M. Nor, E. Hermsen, M. Kooi, S. M. Mintenig, and J. D. France, Microplastics in Freshwaters and Drinking Water: Critical Review and Assessment of Data Quality, Water Res. 155 (2019) 410-422.
[4] S. Li, J. Hu, Y. Cui, and J. Hu, Deeppatent: Patent Classification with Convolutional Neural Networks and Word Embedding. 117 (2018) 721-744.
[5] F. Ridzuan, and W. M. N. W. Zainon, A Review on Data Cleansing Methods for Big Data, Procedia Comput Sci. 161 (2019) 731-738.
[6] E. A. M. Al-Masri, and Y. Bai, A Service-Oriented Approach for Assessing the Quality of Data for the Internet of Things, 2019 IEEE International Conference on Service-Oriented System Engineering (Sose). (2019) 9-97.
[7] M. Navinchandran, M. E. Sharp, M. P. Brundage, and T. B. Sexton, Discovering Critical KPI Factors from Natural Language in Maintenance Work Orders, J Intell Manuf. (2021).
[8] C. S. Wang, S. L. Lin, T. H. Chou, and B. Y. Li, An Integrated Data Analytics Process to Optimise Data Governance of the Non-Profit Organisation, Comput Hum Behav. 101 (2019) 495-505.
[9] S. Yoo, Z. Shi, B. Wen, S. J. Kho, R. Pan, H. Feng, H. Chen, A. Carlsson, P. Edén, W. Ma, M. Raymer, E. J. Maier, Z. Tezak, E. Johanson, D. Hinton, H. Rodriguez, J. Zhu, E. Boja, and B. Zhang, A Community Effort to Identify and Correct Mislabeled Samples in Proteogenomic Studies, Patterns. 2(5) (2021) 100245.
[10] H. He, W. Zhang, and S. Zhang, A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios, Expert Syst Appl. 98 (2018) 105-117.
[11] A. Coad, And S. Srhoj, Catching Gazelles with a Lasso: Big Data Techniques for Predicting High-Growth Firms, Small Bus Econ. 55(3) (2020) 541-565.
[12] J. Miranda, P. Ponce, A. Molina, and P. Wright, Sensing, Smart and Sustainable Technologies for Agri-Food 4.0, Comput Ind. 108 (2019) 21-36.
[13] S. Symeonidis, D. Effrosynidis, and A. Arampatzis, A Comparative Evaluation of Pre-Processing Techniques and Their Interactions for Twitter Sentiment Analysis, Expert Syst Appl. 110 (2018) 298-310.
[14] S. K. Lakshmanaprabu, K. Shankar, A. Khanna, D. Gupta, J. J. P. C. Rodrigues, P. R. Pinheiro, and V. H. C. De Albuquerque, Effective Features to Classify Big Data Using Social Internet of Things. 6 (2018) 24196-24204.
[15] A. Rizk, and A. Elragal, Data Science: Developing Theoretical Contributions in Information Systems via Text Analytics, J Big Data. 7(7) (2020) 1-26.
[16] A. Kiourtis, S. Nifakos, A. Mavrogiorgou, and D. Kyriazis, Aggregating Healthcare Data`s Syntactic and Semantic Similarity Towards their Transformation to Hl7 FHIR Through Ontology Matching, Int J Med Inform. 132 (2019) 104002.
[17] Z. Li, L. Sun, and R. Higgs, Research on, and Development of, Data Extraction and Data Cleaning Technology Based on the Internet of Things, IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (Euc). (2017) 332-341.
[18] V. Palanisamy, and R. Thirunavukarasu, Implications of Big Data Analytics in Developing Healthcare Frameworks–A Review, J King Saud Univ - Comput Inf Sci. 31(4) (2019) 415-425.
[19] S. Fong, J. Li, W. Song, Y. Tian, R. K. Wong, and N. Dey, Predicting Unusual Energy Consumption Events from Smart Home Sensor Network by Data Stream Mining with Misclassified Recall, J Ambient Intell Humaniz Comput. 9 (2018) 1197–1221.
[20] S. Munawar, M. Asif, B. Kabir, A. Ullah, and N. Javaid, Electricity Theft Detection in Smart Meters Using a Hybrid Bi-Directional Gru Bi-Directional LSTM Model, Complex, Intelligent and Software Intensive Systems, Cisis 2021, Lecture Notes in Networks and Systems, Cham: Springer. 278 (2021).
[21] T. Wang, H. Ke, X. Zheng, K. Wang, A. K. Sangaiah, and A. Liu, Big Data Cleaning Based on Mobile Edge Computing in Industrial Sensor-Cloud, IEEE Trans Ind Inform. 16(2) (2020) 1321-1329.
[22] Y. Zheng, and G. Chen, Energy Analysis and Application of Data Mining Algorithms for Internet of Things Based on Hadoop Cloud Platform, IEEE Access. 7 (2019) 183195-183206.
[23] X. Xu, Y. Lei, and Z. Li, An Incorrect Data Detection Method for Big Data Cleaning of Machinery Condition Monitoring, IEEE Trans Ind Electron. 67(3) (2020) 2326-2336.
[24] L. Ma, Q. Pei, L. Zhou, H. Zhu, L. Wang, and Y. Ji, Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence, IEEE Internet Things J. 8(8) (2021) 6757-6770.
[25] D. C. Corrales, A. Ledezma, And J. C. Corrales, A Case-Based Reasoning System for a Recommendation of Data Cleaning Algorithms in Classification and Regression Tasks, Appl Soft Comput. 90 (2020) 106180.
[26] Y. Mo, A Data Security Storage Method for Iot Under Hadoop Cloud Computing Platform, Int J Wirel Inf Netw. 26(3) (2019) 152-157.
[27] Z. Dou, I. Khalil, A. Khreishah, and A. Al-Fuqaha, Robust Insider Attacks Countermeasure for Hadoop: Design and Implementation, IEEE Syst J. 12(2) (2018) 1874-1885.
[28] D. Chattaraj, M. Sarma, A. K. Das, N. Kumar, Joel. J. P. C. Rodrigues, and Y. Park, Heap: An Efficient and Fault-Tolerant Authentication and Key Exchange Protocol for Hadoop-Assisted Big Data Platform, IEEE Access 6. (2018) 75342-75382.
[29] R. Saxena, and S. Dey, A Curious, Collaborative Approach for Data Integrity Verification in Cloud Computing, CSI Trans ICT. 5(4) (2017) 407-418.
[30] M. Maghsoudloo, and N. Khoshavi, Elastic Hdfs: Interconnected Distributed Architecture for Availability–Scalability Enhancement of Large-Scale Cloud Storages, J Supercomput. 76(1) (2020) 174-203.
[31] R. Saxena, and S. Dey, Data Integrity Verification: A Novel Approach for Cloud Computing, S?dhan?. 44(74) (2019) 1-12.
[32] Y. Zheng, and G. Chen, Energy Analysis and Application of Data Mining Algorithms for Internet of Things Based on Hadoop Cloud Platform, IEEE Access. 7 (2019) 183195-183206.
[33] Z. Khayyat, I. F. Ilyas, A. Jindal, S. Madden, M. Ouzzani, P. Papotti, J. A. Quiané-Ruiz, N. Tang, and S. Yin, Bigdansing: A System for Big Data Cleansing, Sigmod `15: Proceedings of the 2015 ACM Sigmod International Conference on Management of Data. (2015) 1215–1230.