A Novel Intra Centroid Based Clustering for Categorical Data of Documents

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2017 by IJETT Journal
Volume-52 Number-3
Year of Publication : 2017
Authors : Ch.M.R.Veena, Dr.A.Chandra Sekhar

Citation 

Ch.M.R.Veena, Dr.A.Chandra Sekhar "A Novel Intra Centroid Based Clustering for Categorical Data of Documents", International Journal of Engineering Trends and Technology (IJETT), V52(3),175-178 October 2017. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
Keyword extraction from documents is an interesting research issue in the field of knowledge and data engineering. Every fragment or statement contains collection of keywords and may have more than one occurrence or frequency in single snippet or document. Clustering is a mechanism which groups the similar type of objects based on the similarity between the objects. In this paper we propose an efficient keyword extraction based clustering model which groups the similar type of documents based on the similarity between the documents with cosine similarity and over novel centriod computational model improves the performance of the clusters. Our model improves the k means with elimination of random centroid selection, average pair wise distance and other parameters to generate consistent clusters our proposed model gives efficient results than traditional models.

Reference
[1] M. Habibi and A. Popescu-Belis, “Enforcing topic diversity in a document recommender for conversations,” in Proc. 25th Int. Conf. Comput. Linguist. (Coling), 2014, pp. 588– 599.
[2] H. P. Luhn, “A statistical approach to mechanized encoding and searching of literary information,” IBM J. Res. Develop., vol. 1, no. 4, pp. 309–317, 1957.
[3] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manage. J., vol. 24, no. 5, pp. 513–523, 1988.
[4] S. Ye, T.-S. Chua, M.-Y. Kan, and L. Qiu, “Document concept lattice for text understanding and summarization,” Inf. Process. Manage., vol. 43, no. 6, pp. 1643–1662, 2007.
[5] A. Csomai and R. Mihalcea, “Linking educational materials to encyclopedic knowledge,” in Proc. Conf. Artif. Intell. Educat.: Building Technol. Rich Learn. Contexts That Work, 2007, pp. 557–559.
[6] D. Harwath and T. J. Hazen, “Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 5073–5076.
[7] A. Popescu-Belis, E. Boertjes, J. Kilgour, P. Poller, S. Castronovo, T. Wilson, A. Jaimes, and J. Carletta, “The AMIDA automatic content linking device: Just-in-time document retrieval in meetings,” in Proc. 5th Workshop Mach. Learn. Multimodal Interact. (MLMI), 2008, pp. 272–283.
[8] A. Popescu-Belis, M. Yazdani, A. Nanchen, and P. N. Garner, “A speech-based just-in-time retrieval system using semantic search,” in Proc. Annu. Conf. North Amer. Chap. ACL (HLT-NAACL), 2011, pp. 80–85.
[9] P. E. Hart and J. Graham, “Query-free information retrieval,” Int. J. Intell. Syst. Technol. Applicat., vol. 12, no. 5, pp. 32–37, 1997.
[10] B. Rhodes and T. Starner, “Remembrance Agent: A continuously running automated information retrieval system,” in Proc. 1st Int. Conf. Pract. Applicat. Intell. Agents Multi Agent Technol., London, U.K., 1996, pp. 487–495.
[11] B. J. Rhodes and P. Maes, “Just-in-time information retrieval agents,” IBM Syst. J., vol. 39, no. 3.4, pp. 685– 704, 2000.
[12] B. J. Rhodes, “The wearable Remembrance Agent: A system for augmented memory,” Personal Technol., vol. 1, no. 4, pp. 218–224, 1997.

Keywords
In this paper we propose an efficient keyword extraction based clustering model which groups the similar type of documents based on the similarity between the documents with cosine similarity and over novel centriod computational model improves the performance of the clusters.