Implementation of a Geocoding In Journalist Social Media Monitoring System

Implementation of a Geocoding In Journalist Social Media Monitoring System

  IJETT-book-cover           
  
© 2021 by IJETT Journal
Volume-69 Issue-12
Year of Publication : 2021
Authors : Abba Suganda Girsang, Sani Muhamad Isa, Raditya Fajar
DOI :  10.14445/22315381/IJETT-V69I12P212

How to Cite?

Abba Suganda Girsang, Sani Muhamad Isa, Raditya Fajar, "Implementation of a Geocoding In Journalist Social Media Monitoring System," International Journal of Engineering Trends and Technology, vol. 69, no. 12, pp. 103-113, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I12P212

Abstract
Conversations on Twitter as one of the biggest social media platforms, especially in Indonesia, which can be related to problems or events that occur around them, can easily become viral and spread widely. It is also supported by the fact of its evolution, that a piece of news is published on a television station, print media, or online media; in fact, some of it comes from issues or viral events that thrive in the community. This research is a continuation of previous research in building an information system platform for journalists, which helps to find what events or issues have the potential to become viral or continue to be updated with ongoing issues. Coupled with the application of the geocode method and the proposed conversation clusterization using Lingo Algorithm that`s provided by Carrot2 Tools. In this study, authors used this algorithm to help determine which conversations were considered important and which were not. These collected conversations can be mapped based on the description of the location or address discussed in the text of the conversation. This will really help journalists to find news material around them, which has proximity to their location and news sources. The success in the geocode process in this study depends on several parameters such as writing location names greatly affects the effectiveness of location name extraction using the NER model that was created, even though it has been trained with the characteristics of the Indonesian region, but the use of slang in showing names locations can be misinterpreted, this is also influenced by the punctuation included, so the separation of location names greatly affects the effectiveness of geocoding. Then the collection of spatial data used also affects the level of the match in finding the described location, as in the example that has been discussed.

Keywords
Data Mining, Geocoding, Lingo Clustering, Naive Bayes, Named Entity Recognition, Twitter

Reference
[1] Rizaty. M. A, SiapaTokohTerpopuler di Twitter pada 2021?”. [Online]. Available: https://databoks.katadata.co.id/datapublish/2021/07/09/siapa-tokoh-erpopuler-di-twitter-pada-2021. [Accessed: 07-Sep-2021].
[2] Sheela, L. A Review of Sentiment Analysis in Twitter Data Using Hadoop. International Journal of Database Theory And Application, 9 (2016) 77-86.
[3] Abbot, D. Introduction to Text Mining: Virtual Data Intensive Summer School. Abbot Analytics, Inc (2013).
[4] Han, Jiawei &Kamber, M. Data mining: concepts and techniques morgankaufmann. 54 (2006).
[5] Prilianti, K. R. & Wijaya, H. Aplikasi Text Mining untukAutomasiPenentuanTrenTopikSkripsidenganMetode K-Means Clustering. J. Cybermatika, 2(1) (2014) 1–6.
[6] R. M. Tripathy, S. Sharma, S. Joshi, S. Mehta, and A. Bagchi, Theme Based Clustering of Tweets, in Proceedings of the 1st IKDD Conference on Data Sciences, (2014) 1–5.
[7] Dinesh, S. Automatic Detection and Extraction of Event Locations in News Report to locate in Map. Master Thesis. (2016).
[8] Scharl, Arno and Klaus Tochtermann. The Geospatial Web: How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society. The Geospatial Web (2007): n. Pag.
[9] Y.-F. R. Chen, G. Di Fabbrizio, D. Gibbon, S. Jora, B. Renger, and B. Wei, Geotracker: geospatial and temporal rss navigation, in WWW ’07: Proceedings of the 16th international conference on World Wide Web. New York, NY, USA: ACM, (2007) 41–50.
[10] Maghfiroh, Siti & Basuki, Setio&Azhar, Yufis. Klasifikasi Tweets TindakKejahatanBerbahasa Indonesia Menggunakan Naive Bayes. Jurnal Repositor. 2. 10.22219/repositor.v2i7.67. (2020).
[11] W. Dakka and S. Cucerzan, Augmenting Wikipedia with named entity tags, IJCNLP, (2008).
[12] J. R. Finkel, T. Grenager, and C. Manning, Incorporating non-local information into information extraction systems by Gibbs sampling, in ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, (2005) 363–370.
[13] X. Schmitt, S. Kubler, J. Robert, M. Papadakis, and Y. LeTraon, A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), (2019) 338-343, doi: 10.1109/SNAMS.2019.8931850.
[14] Drost, S., Wytzisk, A., & Remke, A. Geocoding of Crisis Related Social Media Messages for Assessing Voluntary Help Efforts as a Contribution to Situational Awareness. (2018).
[15] Goldberg D. A geocoding best practices guide. Springfield, IL: North American Association of Central Cancer Registries; (2008).
[16] Goldberg, D. The effect of administrative boundaries and geocoding error on cancer rates. Spat Spattemporal Epidemiol. (2012).
[17] Goldberg DW, Wilson JP, et al. An effective and efficient approach for manually improving geocoded data. Int J Health Geogr 7(60) (2008).
[18] Goldberg DW, Wilson JP, et al. From text to geographic coordinates: the current state of geocoding. URISA J , 19(1) (2007) 33–46.
[19] Girsang, Ganda & Isa, Sani &Harvy, Ikrar. Recommendation System Journalist For Getting Top News Based On Twitter Data. Journal of Physics: Conference Series. 1807. 012006. 10.1088/1742-6596/1807/1/012006. (2021).
[20] A.S. Girsang, S.M. Isa, Natasya, M.E.C. Ginzel ,Implementation of a Journalist Business Intelligence in Social Media Monitoring System, Advances in Science, Technology and Engineering Systems Journal, 5(6) (2020) 1517-1528.
[21] Godfrey, D., Johns, C., Meyer, C., Race, S. &Sadek, C. A Case Study in Text Mining: Interpreting Twitter Data from World Cup Tweets. Arxiv Preprint Arxiv:1408.5427 (2014).
[22] K. Dela Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking, Topical clustering of tweets, Proc. ACM SIGIR SWSM, (2011).
[23] Hill, Linda. Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints. 10.1007/3-540-45268-0_26. (2000) 280-290.
[24] Christen P, Churches T, Willmore A. A probabilistic geocoding system based on a national address file. Proceedings of the Australasian Data Mining Conference: Cairns, AU. (2004).
[25] Finkel, J., Grenager, T., & Manning, C. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. 10.3115/1219840.1219885. (2005).
[26] Rish, Irina. An Empirical Study of the Naïve Bayes Classifier. IJCAI 2001 Work Empir Methods ArtifIntell. 3. (2001).
[27] Takalikar, Mukta &M.Kshirsagar, Manali & Singh, Kavita. Pattern-based Named Entity Recognition using context features. International Journal of Computer Sciences and Engineering. 6. 365-368. 10.26438/ijcse/v6i4.365368. (2018).