Automatic Normalization of Punjabi Words

  ijett-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2013 by IJETT Journal
Volume-6 Number-7
Year of Publication : 2013
Authors : Vishal Gupta


Vishal Gupta. "Automatic Normalization of Punjabi Words". International Journal of Engineering Trends and Technology (IJETT). V6(7):353-357 Dec 2013. ISSN:2231-5381. published by seventh sense research group


For any language in the world, automatic normalization of words is a basic linguistic resource required to develop any type of application in Natural Language Processing (NLP) with high accuracy like: machine translation, document classification, document clustering, text question answering, topic tracking, text summarization and keywords extraction etc. It is not possible to achieve high accuracy without using automatic normalization of words for NLP applications for any language. This paper concentrates on automatic normalization of Punjabi words. Punjabi is the official language for state of Punjab. But Punjabi is under resource language. There are very less number of computational-linguistic resources available for Punjabi. This is 1st in history that automatic standardization of terms related to Punjabi is implemented and this system can be very much useful in creating other applications for Punjabi having good efficiency. For example it can be applied in different NLP applications like machine translation, document association, documents clustering, topic tracking and text summarization etc.


[1] M. W. Berry, “Survey of Text Mining: Clustering, Classification and Retrieval,” Springer Verlag, LLC, New York, 2004.
[2] Farshad Kyoomarsi, Hamid Khosravi, Esfandiar Eslami and Pooya Khosravyan Dehkordy, “Optimizing Text Summarization Based on Fuzzy Logic,” In proceedings of Seventh IEEE/ACIS International Conference on Computer and Information Science, IEEE, University of Shahid Bahonar Kerman, UK, pp. 347-352, 2008.
[3] Vishal Goyal and Gurpreet Singh Lehal, “Automatic standardization of spelling variations of Hindi Text,” In Proceedings of international conference IEEE ICCCT’10, pp. 764-767, 2010.
[4] Praveen Kumar, Ankush Mittal and Sumit Gupta, “A query answering system for E-learning Hindi documents,” South Asian Language Review, vol. 13, 2003.
[5] Gurmukh Singh, Mukhtiar S. Gill and S.S. Joshi, “Punjabi to English Bilingual Dictionary,” Punjabi University Patiala, 1999.
[6] Punjabi Unique word Corpus.
[7] Joel Neto, “Document Clustering and Text Summarization,” Proceedings of 4th International Conference on Practical Applications of Knowledge Discovery and Data Mining, pp. 41-55, London, 2000.
[8] F. Alam, M. Khan, S.M. Habib, and Murtoza,, “Text Normalization system for Bangla,” In: Conference on Language and Technology, 2009.
[9] H. Koo, L. Moran, S. Atwell & Tae-Jin Yoon, “Text Normalization in Python”
[10] D. Huang, H. Wang, S. Yu, Wu Liu, Y. Jia, D. Yuan, “Text Normalization in Mandarin Text-To-Speech System,” In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’08, pp. 4693 – 4696, IEEE Press, New York, 2008.
[11] C. Zhu, H. Li, Hwee Tou Ng, Tie-Jun Zhao and Jie Tang, “A Unified Tagging Approach to Text Normalization,” In Proceedings of ACL`07, pp.688-695, 2007.
[12] G. Filip, W. Agnieszka, W. Miko?aj, J. Krzysztof, “Text Normalization as a Special Case of Machine Translation,” In Proceedings of International Multi Conference on Computer Science and Information Technology, pp.51–56, 2006.
[13] A.G. Ramakrishnan, K. Panchapagesan, N.S. Krishna, P.P. Talukdar and K. Bali, “Hindi Text Normalization,” In Proceedings of fifth International Conference on Knowledge Base Computer Systems, 2004.
[14] G. Xydas, G. Karberis and G. Kouroupertroglou, “Text Normalization for the Pronunciation of Non-Standard Words in an Inflected Language,” In 3rd Hellenic Conference on Artificial Intelligence SETN ‘04, Samos, Greece, pp.390-399, 2004.
[15] M.C. Chuah and W.S. Wong, “A Hybrid Approach to Address Normalization, ” IEEE Intelligent Systems, vol.9, pp. 38-45, 1994.
[16] G. Adda, M.D. Adda ., J.L. Gauvain, and L. Lamel, “Text Normalization and Speech Recognition in French,” In Proceedings of ESCA Eurospeech , 1997.
[17] V. Gupta and G.S. Lehal, “ Automatic Text Summarization System for Punjabi Language,” Journal of Emerging Technologies in Web Intelligence, vol. 5, pp. 257-271, 2013

Punjabi words normalization, normalized Punjabi words, standardized words.