Topic Modeling Techniques for Document Clustering and Analysis of Judicial Judgements

Topic Modeling Techniques for Document Clustering and Analysis of Judicial Judgements

  IJETT-book-cover           
  
© 2022 by IJETT Journal
Volume-70 Issue-11
Year of Publication : 2022
Authors : Amar Jeet Rawat, Sunil Ghildiyal, Anil Kumar Dixit
DOI : 10.14445/22315381/IJETT-V70I11P217

How to Cite?

Amar Jeet Rawat, Sunil Ghildiyal, Anil Kumar Dixit, "Topic Modeling Techniques for Document Clustering and Analysis of Judicial Judgements," International Journal of Engineering Trends and Technology, vol. 70, no. 11, pp. 163-169, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I11P217

Abstract
The digital world is growing rapidly in every dimension. Legal case information and judgements are now available online and are becoming a big problem because of their unstructured textual nature. The classification, analysis, and understanding of such unstructured textual data are complex. Various top-modelling techniques are used for the classification of such corpora. In this paper, two popular topic-modelling models, LDA and LSA, are implemented, and their performances are compared on a dataset of 1000 legal judgement documents. Coherence scores are used to evaluate the performance of both models. Tests show that LDA and LSA have different areas of strength. LDA is good at learning about descriptive topics, while LSA is good at making a short representation of the meaning of documents and words in a corpus.

Keywords
Latent Semantic Analysis, Topic Modeling, Natural Language Processing, Document Clustering, Latent Dirichlet Allocation.

Reference
[1] D. Ji, J. Gao, H. Fei, C. Teng, and Y. Ren, "A Deep Neural Network Model for Speakers Coreference Resolution in Legal Texts," Information Processing and Management, vol. 57, no. 6, pp. 102365, 2020. Crossref, https://doi.org/10.1016/J.IPM.2020.102365
[2] X. Li, P. Wu, and W. Wang, "Incorporating Stock Prices and News Sentiments for Stock Market Prediction: A Case of Hong Kong," Information Processing and Management, vol. 57, no. 5, pp. 102212, 2020. Crossref, https://doi.org/10.1016/J.IPM.2020.102212
[3] A. Y. Ikram and L. Chakir, "Arabic Text Classification in the Legal Domain," 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1-6, 2019. Crossref, https://doi.org/10.1109/ICDS47004.2019.8942343
[4] "Correlated Topic Models," Proceedings of the 18th International Conference on Neural Information Processing Systems, 2022. Crossref, https://dl.acm.org/doi/10.5555/2976248.2976267
[5] P. Anupriya and S. Karpagavalli, "LDA based Topic Modeling of Journal Abstracts," ICACCS 2015 - Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 1-5 , 2015. Crossref, https://doi.org/10.1109/ICACCS.2015.7324058
[6] L. C. Chen, "An Effective LDA-Based Time Topic Model to Improve Blog Search Performance," Information Processing and Management, vol. 53, no. 6, pp. 1299–1319, 2017. Crossref, https://doi.org/10.1016/j.ipm.2017.08.001
[7] D. Newman, Y. Noh, E. Talley, S. Karimi, and T. Baldwin, "Evaluating Topic Models for Digital Libraries," Proceedings of ACM International Conference on Digital Library, pp. 215–224, 2010. Crossref, https://doi.org/10.1145/1816123.1816156
[8] "2014 International Forum on Materials Processing Technology, IFMPT 2014 and 2014 International Conference on Sensors, Instrument and Information Technology, ICSIIT 2014," Applied Mechanics and Materials, vol. 533, 2014. [Online]. Available: https://www-scopus-com-uttaranchaluniversity.knimbus.com/inward/record.uri?eid=2-s2.0-84896298158&partnerID=40&md5=2301bbf2de24db97215f8fe77a252bbc.
[9] S. H. Mohammed and S. Al-Augby, "LSA & LDA Topic Modeling Classification: Comparison Study on E-books," Indonesian Journal of Electrical Engineering and Computer Science, vol. 19, no. 1, pp. 353–362, 2020. Crossref, https://doi.org/10.11591/ijeecs.v19.i1.pp353-362
[10] S. Deerwester, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis Scott," Kehidupan, vol. 3, no. 12, p. 34, 2015.
[11] Y. Xu, J. Yin, J. Huang, and Y. Yin, "Hierarchical Topic Modeling with Automatic Knowledge Mining," Expert Systems with Applications, vol. 103, pp. 106–117, 2018. Crossref, https://doi.org/10.1016/J.ESWA.2018.03.008
[12] Y. Xu, Y. Yin, and J. Yin, "Tackling Topic General Words in Topic Modeling," Engineering Applications of Artificial Intelligence, vol. 62, pp. 124–133, 2017. Crossref, https://doi.org/10.1016/J.ENGAPPAI.2017.04.009
[13] W. Zhu and Y. Yan, "Non-Negative Matrix Factorization via Discriminative Label Embedding For Pattern Classification," Journal of Visual Communication and Image Representation, vol. 55, pp. 477–488, 2018. Crossref, https://doi.org/10.1016/J.JVCIR.2018.06.030
[14] Nilesh Yadav and Dr. Narendra Shekokar, "SQLI Detection Based on LDA Topic Model," International Journal of Engineering Trends and Technology, vol. 69, no. 11, pp. 47-52, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I11P206
[15] I. T. Koponen and M. Nousiainen, "Lexical Networks and Lexicon Profiles in Didactical Texts for Science Education," SStudies in Computational Intelligence, vol. 882 SCI, pp. 15–27, 2020. Crossref, https://doi.org/10.1007/978-3-030-36683-4_2/COVER
[16] Kazuki Ashihara, Chenhui Chu, Benjamin Renoust, Noriko Okubo, Noriko Takemura, Yuta Nakashima and Hajime Nagahara, "Legal Information as a Complex Network: Improving Topic Modeling Through Homophily," in Studies in Computational Intelligence, vol. 882 SCI, M. J. F. M. E. R. L. M. Cherifi H. Gaito S., Ed. Springer, pp. 28–39, 2020. Crossref, https://doi.org/10.1007/978-3-030-36683-4_3
[17] M. Bockholt and K. A. Zweig, "Why We Need a Process-Driven Network Analysis," Studies in Computational Intelligence, vol. 882 SCI, pp. 81–93, 2020. Crossref, https://doi.org/10.1007/978-3-030-36683-4_7/COVER
[18] J. W. Uys, N. D. Du Preez, and E. W. Uys, "Leveraging Unstructured Information using Topic Modelling," Portland International Center for Management of Engineering and Technology, pp. 955–961, 2008. Crossref, https://doi.org/10.1109/PICMET.2008.4599703.
[19] E. S. Kayi, K. Yadav, and H.-A. Choi, "Topic Modeling Based Classification of Clinical Reports," pp. 67–73, 2013.
[20] Z. Li, W. Shang, and M. Yan, "News Text Classification Model Based on the Topic Model," 2016 IEEE/ACIS 15th International Conference on Computer and Information Science, 2016. Crossref, https://doi.org/10.1109/ICIS.2016.7550929
[21] Vasantha Kumari Garbhapu and Prajna Bodapati, "Extractive Summarization of Bible Data using Topic Modeling," International Journal of Engineering Trends and Technology, vol. 70, no. 6, pp. 79-89, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I6P210
[22] N. Yadav and N. Shekokar, "SQLI Detection Based on LDA Topic Model," International Journal of Engineering Trends and Technology, vol. 69, no. 11, pp. 47–52, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I11P206
[23] V. K. Garbhapu and P. Bodapati, "Extractive Summarization of Bible Data using Topic Modeling," International Journal of Engineering Trends and Technology, vol. 70, no. 6, pp. 79–89, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I6P210
[24] V. Srividhya and R. Anitha, "Evaluating Preprocessing Techniques in Text Categorization," International Journal of Computers and Applications, pp. 49–51, 2010.
[25] C. K. Yau, A. Porter, N. Newman, and A. Suominen, "Clustering Scientific Documents with Topic Modeling," Science, vol. 100, no. 3, pp. 767–786, 2014. Crossref, https://doi.org/10.1007/S11192-014-1321-8
[26] R. Priyadarshinia, K. Anurathab, N. Rajendranc, S. Jeyanthid and S. Sujeetha, "LeDoCl : A Semantic Model for Legal Documents Classification using Ensemble Methods," Turkish Journal of Computer and Mathematics Education, vol. 12, no. 9, pp. 1899–1908, 2021. Crossref, https://doi.org/10.17762/turcomat.v12i9.3619
[27] Z. Tong and H. Zhang, "A Text Mining Research Based on LDA Topic Modelling," pp. 201–210, 2016. Crossref, https://doi.org/10.5121/CSIT.2016.60616
[28] M. George, P. Soundarabai, and K. Krishnamurthi, "Impact of Topic Modelling Methods and Text Classification Techniques in Text Mining: A Survey, vol. 4, no " 2017.
[29] D. Korenčić, S. Ristov, and J. Šnajder, "Document-Based Topic Coherence Measures for News Media Text," Expert Systems with Applications, vol. 114, pp. 357–373, 2018. Crossref, https://doi.org/10.1016/J.ESWA.2018.07.063
[30] J. C. Campbell, A. Hindle, and E. Stroulia, "Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data," The Art and Science of Analyzing Software Data, pp. 139–159, 2015. Crossref, https://doi.org/10.1016/B978-0-12-411519-4.00006-9.
[31] "In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics," The Journal of Machine Learning Research, vol. 18, no. 1, pp 6177–6208, 2017. Crossref, https://dl.acm.org/doi/10.5555/3122009.3242026
[32] K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, "Exploring Topic Coherence over Many Models and Many Topics." Association for Computational Linguistics, pp. 952–961, 2012.
[33] Sonali Rajguru, Ajay Chavan, Ashutosh Rane, Shwetali Shinde and Prof. Monika Dangore, "Interview Preparation by Chatbots," SSRG International Journal of Computer Science and Engineering, vol. 5, no. 12, pp. 14-15, 2018. Crossref, https://doi.org/10.14445/23488387/IJCSE-V5I12P103
[34] J. Devezas and S. Nunes, "Characterizing the Hypergraph-of-Entity and the Structural Impact of its Extensions," Applied Network Science, vol. 5, no. 1, pp. 1–42, 2020. Crossref, https://doi.org/10.1007/S41109-020-00320-Z/TABLES/9
[35] J. Y. Yeh, H. R. Ke, W. P. Yang, and I. H. Meng, "Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis," Information Processing and Management, vol. 41, no. 1, pp. 75–95, 2005. Crossref, https://doi.org/10.1016/J.IPM.2004.04.003