POS-HOML: POS Tagging Technique For Gujarati Language Using Hybrid Optimal And Machine Learning Approaches

POS-HOML: POS Tagging Technique For Gujarati Language Using Hybrid Optimal And Machine Learning Approaches

  IJETT-book-cover           
  
© 2021 by IJETT Journal
Volume-69 Issue-11
Year of Publication : 2021
Authors : Pooja M Bhatt, Dr. Amit Ganatra
DOI :  10.14445/22315381/IJETT-V69I11P232

How to Cite?

Pooja M Bhatt, Dr. Amit Ganatra, "POS-HOML: POS Tagging Technique For Gujarati Language Using Hybrid Optimal And Machine Learning Approaches," International Journal of Engineering Trends and Technology, vol. 69, no. 11, pp. 256-262, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I11P232

Abstract
Natural language processing facilitates the interaction between humans and machines. The primary use of the POS is to recognize words` tags, such as nouns, verbs, and adjectives. For the Indian language, it is a difficult task to allocate the correct POS tag to each word in a judgment because of some unknown words in Indian languages. The earlier work for Indian languages was dependent on statistical and rule-based approaches. The Statistical approaches used mathematical equations, while the rule-based approach needs precise language knowledge and hand-written rule. This paper suggests the POS category method for Gujarati language using hybrid optimal and machine learning techniques (POS-HOML) to improve POS tagging. The first contribution of the proposed POS-HOML is to introduce optimal feature selection, which optimizes the multiple features to avoid dimensionality problems. The second contribution is applying the various machine learning techniques, like hidden Markov model (HMM), rule-based approach, Hybrid (combination of rule and Hidden Markov model), Recurrent neural network (RNN), Conditional random field (CRF), Long Short-Term Memory (LSTM) to classify the POS of the given text. Finally, the paper compares various methods using standard bench datasets to analyze the effectiveness of other POS methods in terms of accuracy, precession, recall, F-measure.

Keywords
POS tagging, Gujarati language, optimal, machine learning, hidden Markov model, rule-based network, Long Short-Term Memory, deep neural network

Reference
[1] Krishnapriya, V., P. Sreesha, T. R. Harithalakshmi, T. C. Archana, and Jayasree N. Vettath. Design of a POS tagger using conditional random fields for Malayalam. In 2014 First International Conference on Computational Systems and Communications (ICCSC), (2014) 370-373. IEEE.
[2] Forsati, R. and Shamsfard, M., 2014. Hybrid PoS-tagging: A cooperation of evolutionary and statistical approaches. Applied Mathematical Modelling, 38(13) 3193-3211.
[3] Nongmeikapam, K. and Bandyopadhyay, S., 2012. A transliteration of CRF based Manipuri pos tagging. Procedia Technology, 6 (2012) 582-589.
[4] Crespo, M. and Frías, A., Stylistic authorship comparison and attribution of Spanish news forum messages based on the TreeTagger POS tagger. Procedia-Social and Behavioral Sciences, 212 (2015) 198-204.
[5] Alex, M. and Zakaria, L.Q., Kadazan part of speech tagging using transformation-based approach. Procedia Technology, 11 (2013) 621-627.
[6] Antony, P.J., Mohan, S.P. and Soman, K.P., SVM based part of speech tagger for Malayalam. In 2010 International Conference on Recent Trends in Information, Telecommunication and Computing (2010) 339-341, IEEE.
[7] Bach, N.X., Hiraishi, K., Le Minh, N. and Shimazu, A., Dual decomposition for Vietnamese part-of-speech tagging. Procedia Computer Science, 22 (2013) 123-131.
[8] Brett, D. and Pinna, A., 2015. Patterns, fixedness and variability: using PoS-grams to find phraseologies in the language of travel journalism. Procedia-Social and Behavioral Sciences, 198, 52-57.
[9] Carneiro, H.C., França, F.M. and Lima, P.M., Multilingual part-ofspeech tagging with weightless neural networks. Neural Networks, 66 (2015) 11-21.
[10] Liu, K., Chapman, W., Hwa, R. and Crowley, R.S., Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. Journal of the American Medical Informatics Association, 14(5) (2007) 641-650.
[11] Losee, R.M., Natural language processing in support of decisionmaking: phrases and part-of-speech tagging. Information processing & management, 37(6) (2001) 769-787.
[12] Sánchez-Martínez, F., Pérez-Ortiz, J.A. and Forcada, M.L., Using target-language information to train part-of-speech taggers for machine translation. Machine Translation, 22(1) (2008) 29-66.
[13] Han, C.H. and Palmer, M., 2004. A morphological tagger for Korean: Statistical tagging combined with corpus-based morphological rule application. Machine Translation, 18(4) (2004) 275-297.
[14] Marquez, L., Padro, L. and Rodriguez, H., A machine learning approach to POS tagging. Machine Learning, 39(1) (2000) 59-91.
[15] Rani, P., Pudi, V. and Sharma, D.M., A semi-supervised associative classification method for POS tagging. International Journal of Data Science and Analytics, 1(2) (2016) 123-136.
[16] van Halteren, H. and Rem, M., Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters. Language resources and evaluation, 47(4) (2013) 1233- 1259.
[17] Petrochenkov, V.V. and Kazennikov, A.O., A statistical tagger for morphological tagging of Russian language texts. Automation and Remote Control, 74(10) (2013) 1724-1732.
[18] Dawa, I., Aishan, W. and Dorjiceren, B., Design and Analysis of a POS Tag Multilingual Dictionary for Mongolian. IERI Procedia, 7 (2014) 102-112.
[19] Das, B.R., Sahoo, S., Panda, C.S. and Patnaik, S., Part of speech tagging in Odia using support vector machine. Procedia Computer Science, 48(2015) 507-512.
[20] Ptaszynski, M. and Momouchi, Y., 2012. Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model. Expert Systems With Applications, 39(14) (2012) 11576- 11582.
[21] Pecheux, N., Wisniewski, G. and Yvon, F., Reassessing the value of resources for cross-lingual transfer of POS tagging models. Language Resources and Evaluation, 51(4) (2017) 927- 960.
[22] Alhasan, A. and Al-Taani, A.T., POS tagging for arabic text using bee colony algorithm. Procedia computer science, 142 (2018) 158- 165.
[23] Khan, W., Daud, A., Nasir, J.A., Amjad, T., Arafat, S., Aljohani, N. and Alotaibi, F.S., Urdu part of speech tagging using conditional random fields. Language Resources and Evaluation, 53(3) (2019) 331-362.
[24] Schulz, S. and Ketschik, N., From 0 to 10 million annotated words: part-of-speech tagging for Middle High German. Language Resources and Evaluation, 53(4) (2019) 837-863.
[25] Magistry, P., Ligozat, A.L. and Rosset, S., Exploiting languages proximity for part-of-speech tagging of three French regional languages. Language Resources and Evaluation, 53(4) (2019) 865- 888.
[26] Myint, S.T.Y. and Sinha, G.R., Disambiguation using joint entropy in part of speech of written Myanmar text. International Journal of Information Technology, 11(4) (2019) 667-675.
[27] Mohammed, S., Using machine learning to build POS tagger for under-resourced language: the case of Somali. International Journal of Information Technology, 12 (2020) 717-729.
[28] de Oliveira, L.F.A., e Oliveira, L.E.S., Gumiel, Y.B., Carvalho, D.R. and Moro, C.M.C., Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts. Research on Biomedical Engineering, 36(3) (2020) 267-276.
[29] Akhil, K.K., Rajimol, R. and Anoop, VS, Parts-of-speech tagging for malayalam using deep learning techniques. International Journal of Information Technology, 12(3) (2020) 741-748.
[30] Feng, X., Feng, Z., Zhao, W., Qin, B. and Liu, T., 2020. Enhanced Neural Machine Translation by Joint Decoding with Word and POS-tagging Sequences. Mobile Networks and Applications, 25(5) (20201722-1728.
[31] Besharati, S., Veisi, H., Darzi, A. and Saravani, S.H.H., 2021. A hybrid statistical and deep learning based technique for Persian part of speech tagging. Iran Journal of Computer Science, 4(1) (2021) 35-43.
[32] Maulana, A. and Romadhony, A., Domain Adaptation for Part-of- Speech Tagging of Indonesian Text Using Affix Information. Procedia Computer Science, 179 (2021) 640-647.