Evaluating the Grammatical Correctness of Malayalam Text using improved Text GCN

Evaluating the Grammatical Correctness of Malayalam Text using improved Text GCN

© 2022 by IJETT Journal
Volume-70 Issue-12
Year of Publication : 2022
Author : Merin Cherian, Kannan Balakrishnan
DOI : 10.14445/22315381/IJETT-V70I12P217

How to Cite?

Merin Cherian, Kannan Balakrishnan, "Evaluating the Grammatical Correctness of Malayalam Text using improved Text GCN," International Journal of Engineering Trends and Technology, vol. 70, no. 12, pp. 160-169, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I12P217

Extensive research has been conducted in the domain of automatic grammatical error correction and detection in English and other high-resource languages. However, research in the expanse of Grammatical Error Detection and Correction (GEC) tasks has been very limited in Indian languages. This research uses enhanced TextGCN to perform a grammatical error detection task in Malayalam. It is the first-ever such work in the Malayalam language. This task is evaluated by comparing the results of improved text graph convolutional networks (Text GCN) with TextGCN, LSTM, BiLSTM and CNNLSTM. The results of cross-validation data and unseen sample test data are presented. A training dataset of 200k sentences was created, and 20% of the data was taken as the validation set. Improved Text GCN achieved an accuracy of 90.41% on unseen test data compared to other architectures. This is the first attempt to create a Malayalam grammar checker. Preliminary results from this work show that a graphical representation of text data can be used to check the grammatical correctness of Malayalam text.

Error detection, Malayalam grammar, Malayalam corpus, Malayalam natural language processing, Text graph convolutional networks.

[1] C. Leacock et al., Automated Grammatical Error Detection for Language Learners, Second Edition, Synthesis Lectures in Human Language Technologies, vol. 7, pp. 1-185, 2014. Crossref, https://doi.org/10.2200/S00562ED1V01Y201401HLT025
[2] Lionel Clément, Kim Gerdes, and Renaud Marlet, “A Grammar Correction Algorithm: Deep Parsing And Minimal Corrections for a Grammar Checker,” Series Lecture Notes Computer Science, pp. 47-63, 2011. Crossref, https://doi.org/10.1007/978-3-642-20169-1_4
[3] G. E. Heidorn et al., “Epistle Text-Critiquing System,” IBM Systems Journal, vol. 21, no. 3, pp. 305-327, 1982. Crossref, https://doi.org/10.1147/sj.213.0305
[4] Anna Sågvall Hein, “A Chart-Based Framework for Grammar Checking, Initial Studies,” Proceedings of the 23rd Nordic Conference on Computational Linguistics, 1998.
[5] Alex Warstadt, Amanpreet Singh, and Samuel R. Bowman, “Neural Network Acceptability Judgments,” arxiv prepr. arxiv1805.12471, 2018. Crossref, https://doi.org/10.48550/arXiv.1805.12471
[6] Madhvi Soni, and Jitendra Singh Thakur, “A Systematic Review of Automated Grammar Checking in English Language,” arxiv1804.00540, 2018. Crossref, https://doi.org/10.48550/arXiv.1804.00540
[7] N. Macdonald et al., “The Writer's Workbench: Computer Aids for Text Analysis,” IEEE Transactions on Communications, vol. 30, no. 1, pp. 105-110, 1982. Crossref, https://doi.org/10.1109/TCOM.1982.1095380
[8] Daniel Dahlmeier, and Hwee Tou Ng, “A Beam-Search Decoder for Grammatical Error Correction,” Proceedings of Empirical Methods in Natural Language Processing and Computational Natural Language Learning 2012, pp. 568-578, 2012.
[9] Zheng Yuan, and Mariano Felice, “Constrained Grammatical Error Correction using Statistical Machine Translation,” in Conference on Computational Natural Language Learning 2013, pp. 52-61, 2013.
[10] Keisuke Sakaguchi, Matt Post, and Benjamin Van Durme, “Grammatical Error Correction with Neural Reinforcement Learning,” Proceedings of IJCNLP'17, vol. 2, pp. 366-372, 2017.
[11] Zhu Kaili et al., “A Simple but Effective Classification Model for Grammatical Error Correction,” arxiv.1807.00488, 2018. Crossref, https://doi.org/10.48550/arXiv.1807.00488
[12] Tom Young et al., “Recent Trends in Deep Learning Based Natural Language Processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55-75, 2018. Crossref, https://doi.org/10.1109/MCI.2018.2840738
[13] Hailan Kuang et al., “A Chinese Grammatical Error Correction Method Based on Iterative Training and Sequence Tagging,” Applied Sciences, vol. 12, no. 9, 2022. Crossref, https://doi.org/10.3390/app12094364
[14] Nawei Zhong, Xiaoge Li, and Long Qin, “Hybrid Chinese Grammar Error Checking Model Based on Transformer,” Proceedings of AIPR 2021, pp. 574-579, 2021. Crossref, https://doi.org/10.1145/3488933.3489034
[15] Fabrizio Gotti et al., “Reducing Overdetections in a French Symbolic Grammar Checker by Classification,” Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 390-401, 2011. Crossref, https://doi.org/10.1007/978-3-642-19437-5_32
[16] Nora Madi, and Hend Al-Khalifa, “Error Detection for Arabic Text Using Neural Sequence Labeling,” Applied Sciences, vol. 10, no. 15, p. 5279, 2020. Crossref, https://doi.org/10.3390/app10155279
[17] Sanjay Kumar, Sandhya Umrao, "Extraction of Syntactically Similar Sentences from Huge Corpus for Language Research," SSRG International Journal of Computer Science and Engineering, vol. 5, no. 8, pp. 1-5, 2018. Crossref, https://doi.org/10.14445/23488387/IJCSE-V5I8P101
[18] Mandeep Singh Gill, and Gurpreet Singh Lehal, “A Grammar Checking System for Punjabi,” Proceedings of Coling 2008, pp. 149-152, 2008.
[19] Vikas Verma, and S. K. Sharma, “Critical Analysis of Existing Punjabi Grammar Checker and a Proposed Hybrid Framework Involving Machine Learning and Rule-Base Criteria,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, 2022. Crossref, https://doi.org/10.1145/3514237
[20] R. Sankaravelayuthan, “Spell and Grammar Checker for Tamil,” 2015. Crossref, https://doi.org/10.13140/RG.2.1.3700.6803
[21] B. Kundu, S. Chakraborti, and S. Choudhury, “NLG Approach for Bangla Grammatical Error Correction,” International Conference on Natural Language Processing - 2011, 2011.
[22] Md. Jahangir Alam, Naushad UzZaman, and Mumit Khan, “N-gram based Statistical Grammar Checker for Bangla and English,” International Conference on Convergence Information Technology, pp. 3-6, 2006.
[23] Ankita Nohria, and Harkiran Kaur, "Evaluation of Parsing Techniques in Natural Language Processing," International Journal of Computer Trends and Technology, vol. 60, no. 1, pp. 31-34, 2018. Crossref, https://doi.org/10.14445/22312803/IJCTT-V60P104
[24] K. Mohanan, “Grammatical Relations and Anaphora in Malayalam,” MIT Working Papers in Linguistics, vol. 4, 1981.
[25] T.C.Kumari, R.E Asher, “Language in Society,” Malayalam (Descriptive Grammars) London and New York: Routledge, 1997. Crossref, https://doi.org/10.1017/s004740459922307x
[26] Haowen Jiang, “Malayalam: A Grammatical Sketch and A Text,” 2010.
[27] Joseph Peet, “A Grammar of the Malayalam Language,” 2008. Crossref, https://doi.org/10.31826/9781463214937
[28] Jared Lichtarge et al., “Corpora Generation for Grammatical Error Correction,” Proceedings of NAACL HLT 2019, pp. 3291-3301, 2019.
[29] Uthkarsha sagar, "A Broad Survey of Natural Language Processing," SSRG International Journal of Computer Science and Engineering, vol. 6, no. 12, pp. 15-18, 2019. Crossref, https://doi.org/10.14445/23488387/IJCSE-V6I12P103
[30] Liang Yao, Chengsheng Mao, and Yuan Luo, “Graph Convolutional Networks for Text Classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 905, pp. 7370-7377, 2019. Crossref, https://doi.org/10.1609/aaai.v33i01.33017370
[31] Masoud Malekzadeh et al., “Review of Graph Neural Network in Text Classification,” IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 84-91, 2021. Crossref, https://doi.org/10.1109/UEMCON53757.2021.9666633
[32] Bindhu J S, and Pramod K V, "A Novel Approach for Satellite Image Classification using Optimized Deep Convolutional Neural Network," International Journal of Engineering Trends and Technology, vol. 70, no. 6, pp. 349-365, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I6P236
[33] Jiawei Han, Micheline Kamber, and Jian Pei, 2 - Getting to Know Your Data, Third Edition Data Mining, Morgan Kaufmann, Boston, pp. 39-82, 2012. Crossref, https://doi.org/https://doi.org/10.1016/B978-0-12-381479-1.00002-2
[34] Diganta Misra, “Mish: A Self-Regularized Non-Monotonic Neural Activation Function,” arxiv prepr. arxiv1908.08681, 2019. Crossref, https://doi.org/10.48550/arXiv.1908.08681
[35] Merin Cherian, and Kannan Balakrishnan, “Evaluating Word Embedding Models for Malayalam,” Proceedings of International Conference on Advances in Security and Computing, vol. 11, no. 11, pp. 3769-3783, 2021.
[36] Lata Bopche, Gauri Dhopavkar, and Manali Kshirsagar, “Grammar Checking System Using Rule-Based Morphological Process for an Indian Language,” Communications in Computer and Information Science, vol. 270, no. 2, pp. 524-531, 2012. Crossref, https://doi.org/10.1007/978-3-642-29216-3_57
[37] Caryappa B C, Vishwanath R Hulipalled, and J B Simha, “Kannada Grammar Checker Using LSTM Neural Network,” 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics, pp. 332-337, 2020. Crossref, https://doi.org/10.1109/ICSTCEE49637.2020.9277479
[38] Thomas N. Kipf, and Max Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” International Conference on Learning Representations, 2017.
[39] Stephen Robertson, and Hugo Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,” Foundations and Trends in Information Retrieval, vol. 3, pp. 333-389, 2009. Crossref, https://doi.org/10.1561/1500000019