Single Document Text Summarization Using Clustering Approach Implementing for News Article

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)
© 2014 by IJETT Journal
Volume-15 Number-7
Year of Publication : 2014
Authors : Pankaj Bhole , Dr. A.J. Agrawal


Pankaj Bhole , Dr. A.J. Agrawal. "Single Document Text Summarization Using Clustering Approach Implementing for News Article", International Journal of Engineering Trends and Technology (IJETT), V15(7),364-368 Sep 2014. ISSN:2231-5381. published by seventh sense research group


Text summarization is an old challenge in text mining but in dire need of researcher’s attention in the areas of computational intelligence, machine learning and natural language processing. We extract a set of features from each sentence that helps identify its importance in the document. Every time reading full text is time consuming. Clustering approach is useful to decide which type of data present in document. In this paper we introduce the concept of k-mean clustering for natural language processing of text for word matching and in order to extract meaningful information from large set of offline documents, data mining document clustering algorithm are adopted.


[1] Shen, D., Chen, Z., Yang, Q., Zeng, H., Zhang, B., Lu, Y., et al (2004). Web-page classi?cation through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, p. 249
[2] Demner-Fushman, D., & Lin, J. (2006). Answer extraction, semantic clustering, and extractive summarization for clinical question answering. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics. Association for computational linguistics (p. 848)
[3] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159–165.
[4] Filatova, E., & Hatzivassiloglou, V. (2004). A formal model for information selection in multi-sentence text extraction. In Proceedings of the 20th international conference on computational linguistics (COLING’04), Geneva, Switzerland, August 23–27 (pp.397–403)
[5] Yong, S.P., Ahmad I.Z. Abidin and Chen, Y.Y. (2005). ‘A Neural Based Text Summarization System’, 6th International Conference of DATA MINING, pp.45-50.
[6] Mohamed Abdel Fattah and Fuji Ren (2008). ‘Automatic Text Summarization’, International Journal of Computer Science, Vol., No.1, pp.25-28 Hamid Khosravi, Esfandiar Eslami, Farshad Kyoomarsi and Pooya Khosravyan Dehkordy (2008). ‘Optimizing Text Summarization Based on Fuzzy Logic”, Springer-Verlag Computer and Information Science, SCI 131, pp.121-130.
[7] V. Qazvinian, D.R. Radev, Scienti?c paper summarization using citation summary networks, in: Proceedings of 22nd International Conference on Computational Linguistics, 2008, pp. 689–696
[8] H. Jing and K. McKeown. Cut and paste based text summarization. In Proc. of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pages 178--185, 2000
[9] D.D. Wang, S.H. Zhu, T. Li, Y. Chi, Y.H. Gong, Integrating clustering and multi-document summarization to improve document understanding, in: Proceedings of ACM 17th Conference on Information and Knowledge Management, 2008, pp. 1435–1436
[10] Satoshi, Chikashi Nobata., Satoshi, Sekine., Murata, Masaki., Uchimoto, Kiyotaka., Utiyama, Masao., & Isahara, Hitoshi. (2001). Keihanna human info-communication. Sentence extraction system assembling multiple evidence. In Proceedings 2nd NTCIR workshop (pp. 319–324).
[11] Murdock, Vanessa Graham. (2006). Aspects of sentence retrieval. Ph.D. thesis, University of Massachusetts, Amherst
[12] G. Erkan and D. R. Radev. LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), (2004).
[13] X. Wan: Using only cross-document relationships for both generic and topic-focused multi-document summarizations. InformationRetrieval (2008) 11:25–49 1997.

Natural Language Processing, Stemming,, Clustering.