Features Selection and Weight learning for Punjabi Text Summarization

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2011 by IJETT Journal
Volume-2 Issue-2                          
Year of Publication : 2011
Authors :Vishal Gupta, Gurpreet Singh Lehal


Vishal Gupta, Gurpreet Singh Lehal."Features Selection and Weight learning for Punjabi Text Summarization". International Journal of Engineering Trends and Technology (IJETT),V2(2):45-48 Sep to Oct 2011. ISSN:2231-5381. www.ijettjournal.org. Published by Seventh Sense Research Group.


This paper concentrates on features selection and weight learning for Punjabi Text Summarization. Text Summarization is condensing the source text into a shorter version preserving its information content. It is the process of selecting important sentences from the original document and concatenating them into shorter form. The importance of sentences is decided based on statistical and linguistic features of sentences. For Punjabi l anguage text Summarization, some of statistical features that often increase the candidacy of a sentence for inclusion in summary are: Sentence length feature, Punjabi Keywords selection feature (TF - ISF approach) and number feature. Some of linguistic feat ures that often increase the candidacy of a sentence for inclusion in summary are: Punjabi sentence headline feature, next line feature, Punjabi noun feature, Punjabi proper noun feature, common English - Punjabi noun feature, cue phrase feature and presence of title keywords in a sentence. Mathematical regression is used to estimate the text feature weights based on fuzzy scores of sentences of 50 Punjabi news documents.


[1] Karel Jezek and Josef Steinberger , Automatic Text summarization , Vaclav Snasel (Ed.): Znalosti , pp.1 - 12, ISBN 978 - 80 - 227 - 2827 - 0, FIIT STU Brarislava, Ustav Informatiky a softveroveho inzinierstva, 2008.
[2] J oel larocca Neto, Alex A. Freitas and Celso A.A.Kaestner , Automatic Text Summarization using a Machine Learning Approach, Book: Advances in Artificial Intelligence : Lecture Notes in computer science, Springer Berlin / Heidelberg , Vol. 2507, pp205 - 215, 2002.
[3] Weiguo Fan , Linda Wallace, Stephanie Rich, and Zhongju Zhang, Tapping into the Power of Text Mining, Journal of ACM, Blacksburg, 2005.
[4] Fang Chen, Kesong Ha n and Guilin Chen, An Approach to sentence selection based text summarization, In Proceedings of IEEE TENCON02, pp489 - 493, 2002.
[5] Mohamed Abdel Fattah and Fuji Ren, Automatic Text Summarizatio n, In Proceedings of World Academy of Science, Engineering and Technology, Vol. 27, pp192 - 195, 2008.
[6] Vishal Gupta and Gurpreet Singh Lehal, A Survey of Text Summarizatio n Extractive Techniques, Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3, pp258 - 268, 2010.
[7] Madhavi K. Ganapathiraju, Overview of summarization methods, Self - paced lab in In formation Retrieval, 2002
[8] Rasim M. Alguliev and Ramiz M. Aliguliyev, Effective Summarization Method of Text Documents, in Proceedings of IEEE/WIC/ACM international conference on Web Intelligence (WI’05), pp1 - 8, 2005.
[9] H. P. Luhn, The Automatic Creation of Literature Abstracts, Presented at IRE National Convention, New York, pp159 - 165, 1958. [10] F. Samar ia and S. Young, HMM based architecture for face identification, Image Vision Computing, Vol.12 , No.8, pp.537 – 583, 1994.
[10] Klaus Zechner, A Literature Survey on Information Extraction a nd Text Sum marization, Computational Linguistics Program, 1997.
[11] H. P. Edmundson., New methods in automatic extracting , Journal of the ACM, 16(2): pp264 - 285, 1969
[12] J. Kupiec, J. Pedersen, and F. Chen, A tra inable document summarizer, In Proceedings of the 18th ACM - SIGIR Conference, pp68 - 73, 1995.
[13] Ronald Brandow, Karl Mitze, and Lisa F. Rau, Automatic condensation of electronic publications by sentence selection. Information Processing and Management, 31(5): pp675 - 685, 1995.
[14] Neto, Joel et al., Document Clustering and Text Summarization." In N. Mackin, editor, Proc. 4th Internation al Conf. Practical Applications of Knowledge Discovery and Data Mining , pp41 -- 55, London, 2000.
[15] The Corpus of Cue Phrases, http:// www.cs.otago.ac.nz/staffpriv/alik/papers/apps.ps

Summarization features, Statistical features, Linguistic features, Weight le arning.