Review of Web Clustering Algorithms and Evaluation

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2017 by IJETT Journal
Volume-44 Number-5
Year of Publication : 2017
Authors : Sarika, Mukesh Rawat
DOI :  10.14445/22315381/IJETT-V44P241


Sarika, Mukesh Rawat "Review of Web Clustering Algorithms and Evaluation", International Journal of Engineering Trends and Technology (IJETT), V44(5),211-214 February 2017. ISSN:2231-5381. published by seventh sense research group

Clustering is a procedure of dividing an arrangement of information articles into an arrangement of significant sub-classes, called clusters. Clustering discovers groups of information protests that are comparable in some sense to each other. The individuals from a cluster are more similar to each other than they resemble individuals from different clusters. The objective of clustering is to discover brilliant clusters with the end goal that the between group likeness is low and the intra-group similitude is high. Clustering should be possible by various techniques, for example, Hierarchical,Partitioning,Density based, Grid based and so forth .In Clustering, Hierarchical Clustering is a strategy for group examination which looks to fabricate a chain of command of the groups. Generally Hierarchical Clustering fall into two types: Agglomerative: This is a “bottom up" approach: every perception begins in its own group, and combines of groups are converged as one climbs the order. Divisive: This is a "top down" approach: all perceptions begin in one group, and parts are performed recursively as one moves down the pecking order. The motivation behind the Clustering system is to cluster the data from a massive information set and make over it into a sensible frame for supplementary reason. Clustering is a noteworthy errand in information examination and information mining applications.


1. Nicholas O. Andrews and Edward A. Fox, “Recent Developments in Document Clustering”, thesis, October 16, 2007.
2. Jain and R. Dubes. “Algorithms for Clustering Data.” Prentice Hall, 1988.
3. Chris Staff: Bookmark Category Web Page Classification Using Four Indexing and Clustering Approaches. AH 2008:345-348.
4. Han J., Kamber M.,”Data Mining: Concepts and Techniques,” Morgan Kaufmann (Elsevier), 2006.
5. seung-sikh,”Keyword based document clustering”, report, school of cs, kookim university.seoul,korea.
6. Swatantra kumar sahu*,” Classification of Document clustering Approaches”, International Journal of Advanced Research in Computer Science and Software Engineering, ISSN: 2277 128X, Volume 2, Issue 5, May 2012.
7. Charu C. Aggarwal,” A SURVEY OF TEXT CLUSTERING ALGORITHMS”, rport, IBM T. J. Watson Research Center Yorktown Heights, NY. Anna Huang,” Similarity Measures for Text Document Clustering”, report, Department of Computer Science,The University of Waikato, Hamilton, New Zealand.
8. C. Aggarwal, S. Gates, and P. Yu. On the merits of building categorization systems by supervised clustering. In Proceedings of (KDD) 99, 5th (ACM) International Conference on Knowledge Discovery and Data Mining, pages 352–356, San Diego, US, 1999. ACM Press, New York, US.
9. Deepti Gupta,Komal Kumar Bhatia, A.K. Sharma, A Novel Indexing Technique for Web Documents using Hierarchical Clustering, IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.9, September 2009.
10. S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter.
11. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science, pages 359–366, 2000.
12. F. Beil, M. Ester, X. Xu. Frequent term-based text clustering, ACM KDD Conference, 2002.
10. N. Slonim, N. Tishby. Document Clustering using word clusters via the information bottleneck method, ACM SIGIR Conference, 2000.
13. Zamir, O. Etzioni. Web Document Clustering: A Feasibility Demonstration, ACM SIGIR Conference, 1998.

Clustering, Hierarchical clustering, Sub-classes,Agglomerative Hierarchical clustering, Divisive Hierarchical clustering