Social Media Behavioural Analysis With Document Tree-Based Rule Mining and Document Clustering

S. Geetha; Dr. R. Kaniezhil

doi:https://doi.org/10.14445/22315381/IJETT-V69I1P213

Research Article | Open Access | Download PDF

Volume 69 | Issue 1 | Year 2021 | Article Id. IJETT-V69I1P213 | DOI : https://doi.org/10.14445/22315381/IJETT-V69I1P213

Social Media Behavioural Analysis With Document Tree-Based Rule Mining and Document Clustering

S. Geetha, Dr. R. Kaniezhil

Citation :

S. Geetha, Dr. R. Kaniezhil, "Social Media Behavioural Analysis With Document Tree-Based Rule Mining and Document Clustering," International Journal of Engineering Trends and Technology (IJETT), vol. 69, no. 1, pp. 85-91, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I1P213

Abstract

Twitter in Social media has become an important part of regular lives. This media provides a list of trending real-time topics where most information is hard to comprehend, making it imperative to classify for finding useful information. A large database with real-time information is generated on Twitter. Twitter tweets are a storehouse of text and can reflect human emotions and feelings. Hidden information found in this data can be used for multiple purposes. However, the results depend on choosing a proper feature set. Human biological, pharmacological, and experiential factors influence their behavior. Behavior Analysis (BA) is analyzing individual behavior. BA can be used to filter useful information from tweets in healthcare and business applications. This paper proposes an analysis of human behavior using Twitter data with the proposed DRDC algorithm. The proposed algorithm uses a multitude of techniques in its pre-processing, feature selection, and classification of tweets. Further, the algorithm’s accuracy is checked using the factors of precision and recall times.

Keywords

Behavioral Analysis, Social Media Data Sets, Decision Trees, Document Clustering, Stemming, Pre-processing, DRDC

References

[1] https://www.brandwatch.com/sentiment-analysis-feature/.
[2] http://www.trackur.com/.
[3] https://netbasequid.com/blog/global-social-media-survey/
[4] E. Aydo?an, and M.A. Akcayol, ?A comprehensive survey for sentiment analysis tasks using machine learning techniques?, In Proceedings of 2016 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Sinaia, Romania, (2016) 1-7.
[5] S. ChandraKala, and C. Sindhu, ?Opinion mining and sentiment classification: A survey?, ICTACT journal on soft computing, 3(1)420-425, (2012).
[6] B. Agarwal, N. Mittal, P. Bansal, and S. Garg, ?Sentiment analysis using common-sense and context information?, computational intelligence and neuroscience,2015, doi:10.1155/2015/715730.
[7] W. Medhat, A. Hassan, and H. Korashy, ?Sentiment analysis algorithms and applications: A survey?, Ain Shams engineering journal, 5(4)(2014) 1093-1113 doi:10.1016/j.asej.2014.04.011.
[8] M. Tsytsarau, and T. Palpanas, ?Survey on mining subjective data on the web?, Data Mining, and Knowledge Discovery, 24(3)(2012) 478-514 doi:10.1007/s10618-011-0238-6
[9] J. Bollen, H. Mao, and X. Zeng, ?Twitter mood predictsthestockmarket?, Journal computational science, 2(2011)1-8, doi:10.1016/j.jocs.2010.12.007.
[10] T. Xu, Q. Peng, and Y. Cheng, ?Identifying the semantic orientation of terms using S-HAL for sentiment analysis?, Knowledge-Based Systems, 35 (2012) 279289,doi:10.1016/j.knosys.2012.04.011.
[11] J. Brooke, M. Tofiloski, and M. Taboada, ?Cross-linguistic sentiment analysis: From English to Spanish.?, In Proceedings of International Conference RANLP-2009, Borovets, (2009) 50-54.
[12] Pang, B., and Lee, L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(2008) (1-21–135.
[13] Tumasjan, A.; Sprenger, T. O.; Sandner, P.; and Welpe, I. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of ICWSM.
[14] O’Connor, B.; Balasubramanyan, R.; Routledge, B.; and Smith, N. 2010. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of ICWSM.
[15] Barbosa, L., and Feng, J. Robust sentiment detection on Twitter from biased and noisy data. In Proc. of Coling., (2010).
[16] Bifet, A., and Frank, E. 2010. Sentiment knowledge discovery in Twitter streaming data. In Proc. of 13th International Conference on Discovery Science.
[17] A. Hassan, A. Abbasi, and D. Zeng, Twitter sentiment analysis: A bootstrap ensemble framework, in Social Computing (SocialCom),2013 International Conference on. IEEE, (2013) 357–364.
[18] F. Coletta, N. F. F. d. Sommaggio Silva, E. R. Hruschka, and E. R.Hruschka Combining classification and clustering for tweet sentiment analysis, in Intelligent Systems, 2014 Brazilian Conference on. IEEE, (2014) 210–215.
[19] E. Kouloumpis, T. Wilson, and J. Moore, Twitter sentiment analysis: The good, the bad and the omg! ICWSM, 11(2011) 538–541.
[20] P. T. Ngoc and M. Yoo, The lexicon-based sentiment analysis for fan page ranking in Facebook, in Information Networking (ICOIN), (2014).
[21] A. Minanovic, H. Gabelica, and Z. Krstic, Big data and sentiment analysis using knime: Online reviews vs. social media, Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on. IEEE, (2014) 1464–1468.
[22] A. Porshnev, I. Redkin, and A. Shevchenko, Machine learning in predicting stock market indicators based on historical data and data from Twitter sentiment analysis, in Data Mining Workshops(ICDMW), 2013 IEEE 13th International Conference on. IEEE, (2013) 440–444.
[23] C. Troussas, M. Virvou, K. J. Espinosa, K. Llaguno, and J. Caro, Sentiment analysis of Facebook statuses using naive Bayes classifier for language learning,” in Information, Intelligence, Systems and Applications (IISA), 2013 Fourth International Conference on. IEEE, (2013) 1–6.
[24] Saif M. Mohammad.#Emotional tweets. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics -Proceedings of the Main Conference and the Shared Task, and Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval’12). Association for Computational Linguistics,1(2)(2012) 246–255
[25] Asmi, A., Ishaya, T., Negation identification and calculation in sentiment analysis. In The Second International Conference on Advances in Information Mining and Management, 1-7. (2012)
[26] V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," in Soviet physics doklady, 1966, p. 707.].
[27] S. G. J. Vargas, A Knowledge-Based information Extraction Prototype for Data-Rich Documents in the Information Technology Domain, National University, (2008).
[28] G. Navarro, A guided tour to approximate string matching, ACM Computing Surveys(CSUR), 33, (2001) 31-88.
[29] J. F. Daoason, Post-Correction of Icelandic OCR Text,(Master`s thesis, University of Iceland, Reykjavik, Iceland), (2012).
[30] I. Q. Habeeb, S. A. Yusof, and F. B. Ahmad, Two Bigrams Based Language Model for Auto-Correction of Arabic OCR Errors, International Journal of Digital Content Technology and its Applications,8(28) (2014) 72- 80.
[31] I. Q. Habeeb and S. A. Yusof, Design of Automatic Bilingual Lexicon for Arabic OCR Post-Processing Errors Correction, in International Conference on Rural ICT Development, Malacca, MALAYSIA. (2013).
[32] C. J. van Rijsbergen, (1989), Information Retrieval, Buttersworth, London, second edition, Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, (1997).
[33] Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey, Scatter/Gather: ACluster-based Approach to Browsing Large Document Collections, SIGIR 92(1992) 318 – 329.
[34] Oren Zamir, Oren Etzioni, Omid Madani, Richard M. Karp, Fast and Intuitive Clustering of WebDocuments, KDD.,97(1997)287-290.
[35] Daphe Koller and Mehran Sahami, Hierarchically classifying documents using very few words, Proceedings of the 14th International Conference on Machine Learning (ML), Nashville, Tennessee, (1997) 170-178.
[36] Basavesha D, Dr. Y S Nijagunarya. Soft Computing based Duplicate Text Identification in Online Community Websites, International Journal of Engineering Trends and Technology 68(7)(2020) 1-7.