Apache Pig - A Data Flow Framework Based
on Hadoop Map Reduce

Swarna C; Zahid Ansari

doi:https://doi.org/10.14445/22315381/IJETT-V50P244

Research Article | Open Access | Download PDF

Volume 50 | Number 3 | Year 2017 | Article Id. IJETT-V50P244 | DOI : https://doi.org/10.14445/22315381/IJETT-V50P244

Apache Pig - A Data Flow Framework Based on Hadoop Map Reduce

Swarna C, Zahid Ansari

Citation :

Swarna C, Zahid Ansari, "Apache Pig - A Data Flow Framework Based on Hadoop Map Reduce," International Journal of Engineering Trends and Technology (IJETT), vol. 50, no. 3, pp. 271-275, 2017. Crossref, https://doi.org/10.14445/22315381/IJETT-V50P244

Abstract

Big Data is a technology phenomenon happened due to the increased rate of data growth, complex new data types and parallel advancements in technology stake. Big data can be structured, unstructured or semi-structured, resulting in ineffectiveness of conventional data management methods. Hadoop is a framework for the analysis and transformation of very large data sets using the Map Reduce paradigm. An important characteristic of Hadoop is the splitting of data and computation across thousands of hosts and running applications in parallel close to their data. Hadoop accomplish this by HDFS and Map Reduce. Pig is an apache open source project. It runs on Hadoop by making use of both HDFS and Map Reduce. There are two main components for Pig. First component Pig Latin is the parallel dataflow language which is designed in such a way to fit between the SQL and the Map Reduce. Pig Latin enables the use to define the reading, processing, storing the data in parallel. Pig Latin script explicates a directed acyclic graph, where data flows are represented as edges and operators are represented as nodes. The second component is the run time environment in which Pig Latin programs are executed.

Keywords

Big Data, Hadoop, Map Reduce, Pig, Pig Latin.

References

[1] Bhosale, Harshawardhan S., and Devendra P. Gadekar. "A Review Paper on Big Data and Hadoop." International Journal of Scientific and Research Publications 4.10 (2014)
[2] Chavan, Ms Vibhavari, and Rajesh N. Phursule. "Survey paper on big data." Int. J. Comput. Sci. Inf. Technol 5.6 (2014): 7932-7939.
[3] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix. "Scalable analysis of network measurements with Hadoop and Pig." Network Operations and Management Symposium (NOMS), 2012 IEEE. IEEE, 2012.
[4] Goyal, Vikas, and Deepak Soni. "SURVEY PAPER ON BIG DATA ANALYTICS USING HADOOP TECHNOLOGIES."
[5] Wang, MingXue, Sidath B. Handurukande, and Mohamed Nassar. "RPig: A scalable framework for machine learning and advanced statistical functionalities." Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on. IEEE, 2012.
[6] Ouaknine, Keren, Michael Carey, and Scott Kirkpatrick. "The PigMix Benchmark on Pig, MapReduce, and HPCC Systems." Big Data (BigData congress), 2015 IEEE International Congress on. IEEE, 2015.
[7] Samak, Taghrid, Daniel Gunter, and Valerie Hendrix. "Scalable analysis of network measurements with Hadoop and Pig." Network Operations and Management Symposium (NOMS), 2012 IEEE. IEEE, 2012.
[8] Gates, Alan F., et al. "Building a high-level dataflow system on top of Map-Reduce: the Pig experience." Proceedings of the VLDB Endowment 2.2 (2009): 1414-1425.
[9] Adnan, Muhammad, et al. "Minimizing big data problems using cloud computing based on Hadoop architecture." High-capacity Optical Networks and Emerging/Enabling Technologies (HONET), 2014 11th Annual. IEEE, 2014.
[10] Shang, Weiyi, Bram Adams, and Ahmed E. Hassan. "Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report." Journal of Systems and Software 85.10 (2012): 2195-2204.
[11] Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. IEEE, 2010.
[12] Olston, Christopher, et al. "Pig latin: a not-so-foreign language for data processing." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.
[13] Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. IEEE, 2010.
[14] Wang, Yaoguang, et al. "Improving MapReduce performance with partial speculative execution." Journal of Grid Computing 13.4 (2015): 587-604.
[15] Agarwal, Shafali, and Zeba Khanam. "Map Reduce: A Survey Paper on Recent Expansion." International Journal of Advanced Computer Science and Applications 6.8 (2015): 209-215.
[16] Olshannikova, Ekaterina, et al. "Conceptualizing Big Social Data." Journal of Big Data 4.1 (2017): 3.
[17] Tom White foreword by Doug Cutting; ?Hadoop: The Definitive Guide?; ISBN: 978-1-449-38973-4 [SB] 1285179414.
[18] Bhardwaj, Vibha, Rahul Johari, and Priti Bhardwaj. "Query execution evaluation in wireless network using MyHadoop." Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), 2015 4th International Conference on. IEEE, 2015.
[19] Tanimura, Yusuke, et al. "Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop." Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 2010.
[20] Arushi Jaina, Vishal Bhatnagara Ambedkar” Crime Data Analysis Using Pig with Hadoop”, International Conference on Information Security &Privacy (ICISP2015), 11-12 December 2015
[21] Prasad, PS Durga, T. Vivekanandan, and A. Srinivasan. "A Methodology for WebLog Data analysis using HadoopMapReduce and PIG." i-manager`s Journal on Cloud Computing 3.1 (2015): 13.
[22] Loebman, Sarah, et al. "Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?." Cluster Computing and Workshops, 2009. CLUSTER`09. IEEE International Conference on. IEEE, 2009.
[23] www.wikepedia.org 12/04/2017 at 8:30 pm