A New Method of Text Categorization and Summarization with Fuzzy Confusion Matrix

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2017 by IJETT Journal
Volume-49 Number-2
Year of Publication : 2017
Authors : Dr. Goutam Sarker, Antara Pal, Saswati Das
DOI :  10.14445/22315381/IJETT-V49P217

Citation 

Dr. Goutam Sarker, Antara Pal, Saswati Das "A New Method of Text Categorization and Summarization with Fuzzy Confusion Matrix", International Journal of Engineering Trends and Technology (IJETT), V49(2),107-114 July 2017. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group

Abstract
Present work is a technique fuzzy text categorization followed by extractive summarization of categorized texts. At the onset, the texts of different subjects are fuzzy categorized based on relative matching with index terms of corresponding subjects. After forming the categorical groups, extractive summarization is performed on each text of each category. The fuzzy categorization is evaluated with fuzzy confusion matrix. The performance evaluation of this fuzzy categorization with Holdout method in terms of accuracy, precision, recall and f-score is appreciably high. The accuracy of summarization is evaluated using human generated summary and is fair. Also the categorization and summarization time is acceptable.

 References

[1] Sarker, G.(2010),An Unsupervised Natural Clustering with Optimal Conceptual Affinity, Journal of Intelligent Systems,19(3), 289-300. DOI: 10.1515/JISYS.2010.19.3.289
[2] Sarker, G.(2007),A Heuristic Based Hybrid Clustering for Natural Classification, International Journal of Computer, Information Technology and Engineering (IJCITAE),1(2), 79-86.
[3] Sarker, G.(2008),A Heuristic Based Hybrid Clustering, Institution of Engineers (I), Computer Engineering Division,Vol. 89, 7- 10.
[4] Sarker, G.(2010),An Unsupervised Natural Clustering with Optimal Conceptual Affinity, Journal of Intelligent Systems,19(3), 289-300. DOI: 10.1515/JISYS.2010.19.3.289
[5] Sarker, G.(2013),An Optimal Back Propagation Network for Face Identification and Localization ,International Journal of Computers and Applications (IJCA),ACTA Press, Canada.,35(2).,DOI 10.2316 / Journal .202.2013.2.202 – 3388.
[6] Sarker, G., Dhua, S., Besra, M. (2015),A Learning Based Handwritten Text Categorization,2015 International Conference on Advances in Computer Engineering and Applications, (ICACEA – 2015). ISSN: 978-1-4673-6910-7/15/$31.00 © 2015 IEEE.
[7] Sarker, G., Besra, M., Dhua, S.(2015),A Programming Based Handwritten Text Identification. 2015 International Conference on Advances in Computer Engineering and Applications, (ICACEA – 2015). ISSN: 978-1-4673-6910-7/15/$31.00 © 2015 IEEE.
[8] Sarker, G., Besra, M., Dhua, S.(2015),A Malsburg Learning BP Network Combination for Handwritten Alpha Numeral Recognition,2015 International Conference on Advances in Computer Engineering and Applications, (ICACEA – 2015). ISSN: 978-1-4673-6911-4/15/$31.00 © 2015 IEEE.
[9] Sarker, G., Dhua, S., Besra, M. (2015), An Optimal Clustering for Fuzzy Categorization of Cursive Handwritten Text with Weight Learning in Textual Attributes, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS – 2015) held at Jadavpur University Kolkata INDIA 9-11 July, 2015, ISSN: 978-1-4799-8349-0/15/$31.00 ©2015 IEEE, pp. 6-11.
[10] Sarker, G., A Weight Learning Technique for Cursive Handwritten Text Categorization with Fuzzy Confusion Matrix – 2016 International Conference on Control, Instrumentation, Energy & Communication (CIEC), held at Kolkata, 978-1-5090-0035-7/16/$31.00 © 2016 IEEE,Jan. 20-30, 2016, pp 188-192.
[11] Sarker G., A New Technique For Extraction Based Text Summarization – 31stIndian Engineering Congress, 15-18 December, Kolkata 2016, The Institution of Engineers (India), pp 99-104.
[12] Sarker, G., Besra, M., Dhua, S.(2015),A Malsburg Learning BP Network Combination for Handwritten Alpha Numeral Recognition,2015 International Conference on Advances in Computer Engineering and Applications, (ICACEA – 2015). ISSN: 978-1-4673-6911-4/15/$31.00 © 2015 IEEE.
[13] S. Mori, C.Y. Suen and K. Kamamoto, “Historical review of OCR research and development” Proc. of IEEE, Vol. 80, pp. 1029–1058, July 1992.
[14] S. Impedovo, L. Ottaviano and S. Occhinegro, “Optical character recognition”, International Journal Pattern Recognition and Artificial Intelligence, Vol. 5(1-2), pp. 1–24, 1991
[15] C. L. Liu and H. Fujisawa, “Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems”, Int. Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, 2005.
[16] R. Plamondon and S. N. Srihari, “On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, Jan 2000.
[17] F. V. D. Heijden, “Edge and line Feature Extraction Based on Covariance Model”, IEEE Transaction On Pattern Analtysis And Machine Intelligence Vol.11, No.1, January 1995.
[18] J.Pradeep, E.Srinivasan, and S.Himavathi, “Diagonal Based Feature Extraction For Handwritten Alphabets Recognition System Using Neural Network”, International Journal of Computer Science & Information Technology (IJCSIT), Vol. 3, No. 1, Feb 2011.
[19] O. Rohlik, P. Mautner, V. Matousek and J. Kemph, “HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen”, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation July 16-20, 2003, Kobe, Japan.
[20] H. Cao, R. Prasad and P. Natarajan, “Handwritten and Typewritten Text Identification and Recognition using Hidden Markov Models”, International Conference on Document Analysis and Recognition, 2011.

Keywords
Fuzzy Text Categorization, Fuzzy Confusion Matrix, Extractive Summarization, Term Frequency, Inter document frequency, Sentence Weight, Clustering, OCA, Holdout Method, Accuracy, Precision, Recall, F-Score.