Survey on Clustering on the Cloud by Using Map Reduce in Large Data Applications

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2015 by IJETT Journal
Volume-21 Number-8
Year of Publication : 2015
Authors : M Chaitanya Kumari, P Nagendra Babu


M Chaitanya Kumari, P Nagendra Babu"Survey on Clustering on the Cloud by Using Map Reduce in Large Data Applications", International Journal of Engineering Trends and Technology (IJETT), V21(8),392-395 March 2015. ISSN:2231-5381. published by seventh sense research group


The term Clustering implies grouping of objects depends upon their similarity. In another way clustering is the process of grouping a set of objects so that objects within a group or cluster have high similarity, but comparing objects with other clusters must have high dissimilarity. In Cloud computing multiple users can access a single server to retrieve and update their data without purchasing licenses for different applications. The need of clustering on cloud is to retrieve the appropriate data because now a days we are dealing with peta bytes of data. For this reason we are using map reduce frame work which handles huge amounts of data by using two phases such as “Map” and “Reduce”. Several algorithms such as K-Means, KMedoids, CLARA, and CLARANS are used in clustering. If we use CLARA with Hadoop Map Reduce frame work cloud will be very effective and we can achieve better efficiency.


[1] Weizhong Zhao,Huifang Ma and Qing He, parallel K-Means Clustering Based on MapReduce, Institute of Computing Technology: Cinese Academy of Sciences, pp. 674-679, 2009.
[2] M.Snir, S.W.Otto, D.W.Walker, J.Dongarra, and S.Huss-Lederman. MPI: The Complete Reference.MIT Press,1995.
[3] Jakovits, Pelle and Satish Narayana, Clustering on the Cloud:Reducing CLARA to Map Reduce,Mobile cloud Lab,Institute of Computer Science: University of Tartu,2013.
[4] J.Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.H. Bae, J. Qui, and G. Fox. Twister: a runtime for iterative map reduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HDPC‟ 10, pages 810-818, New York, NY, USA, 2010. ACM.
[5] M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenkar, and I. Stoica. Spark: cluster computing with networking sets. In 2nd USENIX conf. on Hot topics in cloud computing, HotCloud‟10, page 10, 2010.
[6] Chen. Li, Yanfeng. Zhang, Minghai. Jiao and Ge. Yu, Mux-Kmeans: Multiple Kmeans for Clustering Large-Scale Data Set, Northeastern University: China, 2014.
[7] Leykin, Anton; Verschelde, Jan; Zhuang, Yan (2006). "Parallel Homotopy Algorithms to Solve Polynomial Systems". Proceedings of ICMS 2006.
[8] Gruman, Galen (2008-04-07). "What cloud computing really means". InfoWorld. Retrieved 2009-06-02.
[9] Hiremath. Shruthi and Chandra.Pallavi, Efficient Clustering Algorithm for Storage Optimization in the Cloud, VIT University: Vellore, Tamilnadu, 2013.
[10] Dean. Jeffrey and Ghemawat Sanjay, MapReduce: Simplified Data Processing on Large Clusters, USENIX Association: Google, Inc, 2004.
[11] Ha Lee. Kyong, Joon Lee.Yoon, Choi.Hyunsik, Chung. Yon Dohn and Moon.Bongki, Parallel Data Processing with MapReduce: A Survey, KAIST: SIGMOD Record, December 2011, Vol. 40, No. 4.

CLARA, Map Reduce, Scientific Computing, Cloud Computing, Scheduling algorithm, K-Means.