Extractive Summarization of Bible Data using Topic Modeling

Vasantha Kumari Garbhapu; Prajna Bodapati

doi:https://doi.org/10.14445/22490183/IJETT-V70I6P210

Research Article | Open Access | Download PDF

Volume 70 | Issue 6 | Year 2022 | Article Id. IJETT-V70I6P210 | DOI : https://doi.org/10.14445/22490183/IJETT-V70I6P210

Extractive Summarization of Bible Data using Topic Modeling

Vasantha Kumari Garbhapu, Prajna Bodapati

Received	Revised	Accepted	Published
23 Mar 2022	14 May 2022	03 Jun 2022	27 Jun 2022

Citation :

Vasantha Kumari Garbhapu, Prajna Bodapati, "Extractive Summarization of Bible Data using Topic Modeling," International Journal of Engineering Trends and Technology (IJETT), vol. 70, no. 6, pp. 79-89, 2022. Crossref, https://doi.org/10.14445/22490183/IJETT-V70I6P210

Abstract

To attain a sense of balance among summary quality and machine readability to preserve the sentence structure and topic similarity, this work presents a statistical and topic modeling-based strategy to extract automatic summarization using the English Bible data set. First, it proposes an algorithm to generate an automatic summary. The measure's core is covered by the Latent Dirichlet Allocation (LDA) method that can capture the most important topics. After that, the summary methods are ranked by the quantity to which the most important topics of their summaries are similar to the most important topics of the reference document. Then, the work focuses primarily on evaluating the summary quality by the ROUGE metric and co-selection measures like Precision, F1 score, and Recall. The evaluation results show that the proposing algorithm has better results with ROUGE score, topic similarity, and manual summary than LSA and TextRank algorithms. Furthermore, this algorithm is competent in computational processing and an understandable method for implementing the English Bible dataset that has not been studied previously.

Keywords

Automatic Extract, Latent Dirichlet Allocation (LDA), ROUGE, Summary Evaluation, Text Summarization.

References

[1] Hu, Ya-Han, Yen-Liang Chen and Hui-Ling Chou, Opinion Mining from Online Hotel Reviews – a Text Summarization Approach, Information Processing & Management. 53(2) (2017) 436–49. Doi:10.1016/j.ipm.2016.12.002.
[2] Oussous, Ahmed, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen and Samir Belfkih, Big Data Technologies: A Survey, Journal of King Saud University - Computer and Information Sciences. 30(4) (2018) 431–48. Doi:10.1016/j.jksuci.2017.06.001.
[3] Uma, C, S Krithika, and C Kalaivani, A Survey Paper on Text Mining Techniques, International Journal of Engineering Trends and Technology. 40(4) (2016) 225–29. https://doi.org/10.14445/22315381/ijett-v40p237.
[4] Ye, Shiren, Tat-Seng Chua, Min-Yen Kan and Long Qiu, Document Concept Lattice for Text Understanding and Summarization, Information Processing & Management. 43(6) (2007) 1643–62. Doi:10.1016/j.ipm.2007.03.010.
[5] Steinberger, Josef, Massimo Poesio, Mijail A. Kabadjov and Karel Ježek, Two Uses of Anaphora Resolution in Summarization, Information Processing & Management. 43(6) (2007) 1663–80. Doi:10.1016/j.ipm.2007.01.010.
[6] Lloret, Elena, Laura Plaza and Ahmet Aker, Analyzing the Capabilities of Crowdsourcing Services for Text Summarization, Language Resources and Evaluation. 47(2) (2012) 337–69. Doi:10.1007/s10579-012-9198-8.
[7] Etemad, Abdul Ghafoor, Ali Imam Abidi and Megha Chhabra, A Review on Abstractive Text Summarization Using Deep Learning, 9th International Conference on Reliability, Infocom Technologies and Optimization Trends and Future Directions, ICRITO. (2021). Doi:10.1109/icrito51393.2021.9596500.
[8] Vishal Gupta, and Gurpreet Singh Lehal, Features Selection and Weight Learning for Punjabi Text Summarization, International Journal of Engineering Trends and Technology. 2(2) (2011) 45–48.
[9] Chaudhary, Nidhi, and Shalini Kapoor, Key Phrase Extraction Based Multi-Document Summarization, International Journal of Engineering Trends and Technology. 13(4) (2014) 148–53. https://doi.org/10.14445/22315381/ijett-v13p232.
[10] Gambhir, Mahak and Vishal Gupt, Recent Automatic Text Summarization Techniques: A Survey, Artificial Intelligence Review. 47(1) (2016) 1–66. Doi:10.1007/s10462-016-9475-9.
[11] Allahyari, Mehdi, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D, Juan B, and Krys Kochut, Text Summarization Techniques: A Brief Survey, International Journal of Advanced Computer Science and Applications. 8(10) (2017). Doi:10.14569/ijacsa.2017.081052.
[12] Mekuria, Getahun Tadesse, and Aniket S Jagtap, Automatic Amharic Text Summarization Using NLP Parser, International Journal of Engineering Trends and Technology. 53(1) (2017) 52–58. https://doi.org/10.14445/22315381/ijett-v53p210.
[13] Abdel-Salam, Shehab and Ahmed Rafea, Performance Study on Extractive Text Summarization Using Bert Models, Information. 13(2) (2022) 67. Doi:10.3390/info13020067.
[14] Sarker, Goutam, Antara Pal, and Saswati Das, A New Method of Text Categorization and Summarization with Fuzzy Confusion Matrix, International Journal of Engineering Trends and Technology. 49(2) (2017) 107–14. https://doi.org/10.14445/22315381/ijett-v49p217.
[15] El-Gedawy, Madeeh Nayer, Comparing PMI-Based to Cluster-Based Arabic Single Document Summarization Approaches, International Journal of Engineering Trends and Technology. 11(8) (2014) 379–83. https://doi.org/10.14445/22315381/ijett-v11p274.
[16] Anitha, Raahavi, Rehapriadarsini and Sudarshana S, Abstractive Text Summarization, Journal of Xidian University. 14(6) (2020) 854– 57. Doi:10.37896/jxu14.6/094.
[17] Rawat, Mukesh, Mohd Hamzah Siddiqui, Mohd Anas Maan, Shashaank Dhiman, and Mohd Asad, Text Summarization Using Extractive Techniques, Process Mining Techniques for Pattern Recognition. (2022) 107–19. Doi:10.1201/9781003169550-9.
[18] Mishra, Ritwik and Tirthankar Gayen, Automatic Lossless-Summarization of News Articles with Abstract Meaning Representation, Procedia Computer Science. 135 (2018) 178–85. Doi:10.1016/j.procs.2018.08.164.
[19] Rodríguez-Vidal, Javier, Jorge Carrillo-de-Albornoz, Enrique Amigó, Laura Plaza, Julio Gonzalo and Felisa Verdejo, Automatic Generation of Entity-Oriented Summaries for Reputation Management, Journal of Ambient Intelligence and Humanized Computing, 11(4) (2019) 1577–91. Doi:10.1007/s12652-019-01255-9.
[20] Dhankhar, Sunil, and Mukesh Kumar Gupta, Automatic Extractive Summarization for English Text: A Brief Survey, Proceedings of Second Doctoral Symposium on Computational Intelligence. (2021) 183–98. Doi:10.1007/978-981-16-3346-1_15.
[21] Bhole, Pankaj, and Dr. A.J Agrawal, Single Document Text Summarization Using Clustering Approach Implementing for News Article, International Journal of Engineering Trends and Technology. 15(7) (2014) 364–68. https://doi.org/10.14445/22315381/ijett-v15p270.
[22] Deshpande, Anjali R, and Lobo LMRJ, Text Summarization Using Clustering Technique, International Journal of Engineering Trends and Technology. 4(8) (2013) 3348–51.
[23] Torres-Moreno, Juan-Manuel, Automatic Text Summarization: Some Important Concepts, Automatic Text Summarization. (2014) 23– 52. Doi:10.1002/9781119004752.ch2.
[24] Luhn H. P, The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development. 2(2) (1958) 159–65. Doi:10.1147/rd.22.0159.
[25] Gupta, Vishal and Gurpreet Singh Lehal, A Survey of Text Summarization Extractive Techniques, Journal of Emerging Technologies in Web Intelligence. 2(3) (2010). Doi:10.4304/jetwi.2.3.258-268.
[26] Carbinell, Jaime and Jade Goldstein, The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, ACM SIGIR Forum. 51(2) (2017) 209–10. Doi:10.1145/3130348.3130369.
[27] Nomoto, Tadashi and Yuji Matsumoto, Supervised Ranking in Open-Domain Text Summarization, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. (2001). Doi:10.3115/1073083.1073161.
[28] Mani, Inderjeet, Automatic Summarization, Natural Language Processing. (2001). Doi:10.1075/nlp.3.
[29] Kiyoumarsi, Farshad, Evaluation of Automatic Text Summarizations Based on Human Summaries, Procedia - Social and Behavioral Sciences. 192 (2015) 83–91. Doi:10.1016/j.sbspro.2015.06.013.
[30] van der Lee, Chris, Albert Gatt, Emiel van Miltenburg and Emiel Krahmer, Human Evaluation of Automatically Generated Text: Current Trends and Best Practice Guidelines, Computer Speech & Language. 67 (2021) 101-151. Doi:10.1016/j.csl.2020.101151.
[31] Radev, Dragomir R., Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Hong Qi, Arda Çelebi, Danyu Liu and Elliott Drabek, Evaluation Challenges in Large-Scale Document Summarization, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03. (2003). Doi:10.3115/1075096.1075144.
[32] Lin and Chin-Yew, ROUGE: A Package for Automatic Evaluation of Summaries, Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics. (2004) 74-81.
[33] Sarker, Goutam, Antara Pal, and Saswati Das. A Modified Optimal Clustering Technique for Image Categorization and Summarization, International Journal of Engineering Trends and Technology. 49(2) (2017) 99–106. https://doi.org/10.14445/22315381/ijett-v49p216.
[34] Blei, David M, Andrew Y Ng and Michael I Jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research. 3 (2003) 993– 1022.
[35] Garbhapu, VK and Bodapati P, A Comparative Analysis of Latent Semantic Analysis and Latent Dirichlet Allocation Topic Modeling Methods Using Bible Data, Indian Journal of Science and Technology. 13(44) (2020) 4474–82. Doi:10.17485/ijst/v13i44.1479.
[36] Dhivya J, Saritha A, A System for Detecting Network Intruders in Real-Time, IJETT International Journal of Computer Science and Engineering. 3(5) (2016) 34-37.
[37] Blei, David M, Probabilistic Topic Models, Communications of the ACM. 55(4) (2012) 77–84. Doi:10.1145/2133806.2133826.
[38] Landauer, Thomas K, Peter W. Foltz and Darrell Laham, An Introduction to Latent Semantic Analysis, Discourse Processes. 25(2-3) (1998) 259–84. Doi:10.1080/01638539809545028.
[39] Witbrock, Michael J. and Vibhu O. Mittal, Ultra-Summarization (Poster Abstract), Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '99. (1999). Doi:10.1145/312624.312748.
[40] Mihalcea, Rada, Tarau and Paul, TextRank: Bringing Order into Text, Proceedings of the Conference on Empirical Methods in Natural Language Processing. (2004) 404–11.
[41] Griffiths, Thomas L. and Mark Steyvers, A Probabilistic Approach to Semantic Representation, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society. (2019) 381–86. Doi:10.4324/9781315782379-102.