Aspect Based Polarity Extraction in Tamil Tweets using Tree-Based Recursive Partitioning Techniques

The overall outcome of the emotional statement about one particular discussion falls into two positive or negative that can be identified by the word/words and their synonymous that are closely connected with the theme of the topic. This work aims to identify the impacting word of the motion and analyse the performance of the Tree-based Machine Learning (ML) classifiers to classify the Tamil Tweets into two polarities (positive or negative). All the models are separately trained and tested with both Non-Weighted Vector and Weighted Vectors and analysed to freeze the accuracy. The prelabelled 1015 Tamil tweets are pre-processed to remove the noises to form a word dictionary. The words in the dictionary are tagged with weight to indicate the impact. The structured corpus with various lengths of statements is experimented with using a Decision tree, XGBoost and Random Forest classifiers with varying parameters. The comparative study report shows that Random Forest performs well by showing 78.81% of accuracy with Weighted Vector, which is better compared with Decision Tree and XGBoost classifiers.

Decision tree, XGBoost, Random forest, Natural Language Processing, Classification.

