Preliminary Analysis of HDFS Read Operation, Threats Impacts and Mitigation

Preliminary Analysis of HDFS Read Operation, Threats Impacts and Mitigation

  IJETT-book-cover           
  
© 2024 by IJETT Journal
Volume-72 Issue-9
Year of Publication : 2024
Author : Imane LEBDAOUI, Ghizlane ORHANOU
DOI : 10.14445/22315381/IJETT-V72I9P103

How to Cite?
Imane LEBDAOUI, Ghizlane ORHANOU, "Preliminary Analysis of HDFS Read Operation, Threats Impacts and Mitigation," International Journal of Engineering Trends and Technology, vol. 72, no. 9, pp. 33-48, 2024. Crossref, https://doi.org/10.14445/22315381/IJETT-V72I9P103

Abstract
The Hadoop Distributed File System (HDFS) is widely used to store and enable access to, read, and write large volumes of data and files. However, HDFS, like Hadoop, remains vulnerable to numerous security threats that make it a target for malicious activities, leading to the loss and manipulation of data and files in an illegal manner. To this end, this research aims to reconsider the preservation of security throughout the operation of reading files from HDFS, focusing on preserving the security of the assets and flows involved. Using a systematic security analysis of the elements mentioned above, this article addresses the existing security issues, discusses the security requirements to be met to enable better file reading from HDFS, and analyzes the impact of various threats on the associated assets. Through the analysis of three main attack use cases, the threats are identified and classified into six threat families. For each family of threats, mitigation measures are proposed to reduce the impact of the threats and enable better reading operations from HDFS.

Keywords
HDFS Read operation, Hadoop security, Threat analysis, Mitigation, DataNodes.

References

[1] Gurjit Singh Bhathal, and Amardeep Singh Dhiman, “Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework,” Recent Advances in Computer Science and Communications, vol. 13, no. 4, pp. 790-797, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Boris Lublinsky, Kevin T. Smith, and Alexey Yakubovich, Professional Hadoop Solutions, John Wiley and Sons, pp. 1-477, 2013.
[Google Scholar] [Publisher Link]
[3] Mark Grover, Hadoop Application Architectures: Designing Real-World Big Data Applications, O’Reilly Media, pp. 1-400, 2015.
[Google Scholar] [Publisher Link]
[4] Vladimir Kaplarevic, Apache Hadoop Architecture Explained (with Diagrams), Phoenixnap Global IT Services, 2020. [Online]. Available: https://phoenixnap.com/kb/apache-hadoop-architecture-explained
[5] Karwan Jameel Merceedi, and Nareen Abdulla Sabry, “A Comprehensive Survey for Hadoop Distributed File System,” Asian Journal of Computer Science and Information Technology, vol. 11, no. 2, pp. 46-57, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Yordan Kalmukov et al., “Analysis and Experimental Study of HDFS Performance,” TEM Journal, vol. 10, no. 2, pp. 806-814, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Xiong Wenjun, and Robert Lagerström, “Threat Modeling – A Systematic Literature Review,” Computers & Security, vol. 84, pp. 53-69, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Kinza Yasar, Hadoop-Distributed-File-System (HDFS), [Online]. Available: https://www.techtarget.com/searchdatamanagement/definition/Hadoop-Distributed-File-System-HDFS
[9] Apache Hadoop, Apache Website, 2023. [Online]. Available: https://hadoop.apache.org/
[10] Anatomy of File Read and Writes in HDFS, Geeksforgeeks, 2022. [Online]. Available: https://www.geeksforgeeks.org/anatomy-of-file-read-and-write-in-hdfs/
[11] Subhi R.M. Zeebaree et al., “Characteristics and Analysis of Hadoop Distributed Systems,” Technology Reports of Kansai University, vol. 62, no. 4, pp. 1555-1564, 2020.
[Google Scholar]
[12] How Hadoop Works Internally – Inside Hadoop, Data Flair, 2023. [Online]. Available: https://data-flair.training/blogs/how-hadoop-works-internally/
[13] What do you Mean by Metadata in Hadoop HDFS?, Data Flair. [Online]. Available: https://data-flair.training/forums/topic/what-do-you-mean-by-metadata-in-hadoop-hdfs/
[14] Tom White, Hadoop: The Definitive Guide, O'Reilly Media, pp. 1-688, 2012.
[Google Scholar] [Publisher Link]
[15] Hadoop Distributed File System (HDFS) Architectural Documentation, Hawaii, [Online]. Available: http://itm-vm.shidler.hawaii.edu/HDFS/ArchDocCommunication.html
[16] HDFS Administration, Backing up HDFS MetaData, Cloudera Documentation, 2023. [Online]. Available: https://docs.cloudera.com/?tab=cdp-public-cloud
[17] Yoon-Su Jeong, and Yong-Tae Kim, “A Token-Based Authentication Security Scheme for Hadoop Distributed File System using Elliptic Curve Cryptography,” Journal of Computer Virology and Hacking Techniques, vol. 11, pp. 137-142, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Gurmukh Singh, Hadoop 2. x Administration Cookbook, Packt Publishing, pp. 1-348, 2017. [Google Scholar] [Publisher Link] [19] Ben Spivey, and Joey Echeverria, Hadoop Security: Protecting your Big Data Platform, O’Reilly Media, pp. 1-340, 2015.
[Google Scholar] [Publisher Link]
[20] Su-Hyun Kim, and Im-Yeong Lee, “Block Access Token Renewal Scheme Based on Secret Sharing in Apache Hadoop,” Entropy, vol. 16, no. 8, pp. 4185-4198, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[21] “Securing Big Data: Security Recommendations for Hadoop and NoSQL Environments,” Securosis, L.L.C, pp. 1-18, 2012.
[Google Scholar] [Publisher Link]
[22] Security-Vulnerabilities, cvedetails, 2023. [Online]. Available: https://www.cvedetails.com/vulnerability-list/
[23] Adam Shostack, Threat Modeling: Designing for Security, Wiley, pp. 1-624, 2014.
[Google Scholar] [Publisher Link]
[24] Priya P. Sharma, and Chandrakant P. Navdeti, “Securing Big Data Hadoop: A Review of Security Issues, Threats and Solution,” International Journal of Computer Science and Information Technologies, vol. 5, no. 2, pp. 2126-2131, 2014.
[Google Scholar] [Publisher Link]
[25] Hadeer Mahmoud, Abdelfatah Hegazy, and Mohamed H. Khafagy, “An Approach for Big Data Security Based on Hadoop Distributed File System,” 2018 International Conference on Innovative Trends in Computer Engineering (ITCE), Aswan, Egypt, pp. 109-114, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Mike Ferguson, “Enterprise Information Protection - the Impact of Big Data,” Intelligent Business Strategies, 2014.
[Google Scholar] [Publisher Link]
[27] Nataliya Shevchenko et al., “Threat Modeling: A Summary of Available Methods,” Software Engineering Institute, pp. 1-26, 2018.
[Google Scholar] [Publisher Link]
[28] Maahir Ur Rahman Mohamed Shibly, and Borja Garcia de Soto, “Threat Modeling in Construction: An Example of a 3D Concrete Printing System,” 37th International Symposium on Automation and Robotics in Construction, Kitakyushu, Japan, pp. 625-632, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Hadoop-Architecture-Overview, Datadog, 2020. [Online]. Available: https://www.datadoghq.com/blog/hadoop-architecture-overview/
[30] Hadoop CVE List, Hadoop apache. [Online]. Available: https://hadoop.apache.org/cve_list.html
[31] Tim Keary, A Guide to Spoofing Attacks and How to prevent them in 2024, Comparitech, 2023. [Online]. Available: https://www.comparitech.com/net-admin/spoofing-attacks-guide/
[32] Encrypting Data at Rest, Cloudera documents, [Online]. Available: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/security-encrypting-data-at-rest/topics/cm-security-encryption-planning.html.