Social Media Insights About COVID-19 in Portugal: A Text Mining Approach
Keywords:Social media, COVID-19, natural language processing, sentiment analysis, topic modeling, public opinion
The rapid spread of COVID-19 around the world had a significant impact on daily life. As in other countries, measures were taken in Portugal to combat the exponential increase of cases, such as curfews and the use of masks. Thus, in parallel with the direct consequences on health and the healthcare sector, the pandemic also caused changes in human behavior from a sociological viewpoint.
The objective of this dissertation is to attain a perception of the reality concerning COVID-19. For this purpose, real-time data was extracted from three sources, two of them being social media platforms – Twitter and Reddit – and the other one being Público, a Portuguese online newspaper. The adopted approach, based on topic modelling and sentiment analysis, was validated within the Portugal context, concerning data over a period of one year, but it can equally be employed in similar situations and other countries and provide decision-making support.
After the data extracting, it was prepared for application of natural language processing (NLP) tools specific to the Portuguese language, which can represent a challenge due to the lexical richness. With the gathered information, a dashboard was built, with the purpose of gaining insights on the COVID-19 pandemic in Portugal. It was concluded that the topics discussed on social media reflect the events related to the pandemic. In a final stage, these dashboards were evaluated by public health experts, who highlighted the potential of the results obtained. The data and dashboards will be made available to the scientific community upon request.
D. Taylor, The Coronavirus Pandemic: A Timeline – The New York Times, 2020.
I. Kislaya, P. Gonçalves, M. Barreto, R. Sousa, A. Garcia, R. Matosa, R. Guiomar and A. Rodrigues, “Seroprevalence of SARS-CoV-2 Infection in Portugal in May-July 2020: Results of the First National Serological Survey (ISNCOVID-19),” Acta Médica Portuguesa, vol. 34, p. 87–94, 2 2021.
WHO report, Coronavirus Disease (COVID-19) Situation Reports, 2021.
Jornal de Notícias, Cronologia dos principais acontecimentos de um ano de covid em Portugal, 2021.
R. Chandrasekaran, V. Mehta, T. Valkunde and E. Moustakas, “Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study,” Journal of Medical Internet Research, vol. 22, p. e22624, 2020.
Y. Marzouki, F. S. Aldossari and G. A. Veltri, “Understanding the buffering effect of social media use on anxiety during the COVID-19 pandemic lockdown,” Humanities and Social Sciences Communications, vol. 8, 2021.
S. Kemp, Digital in Portugal, 2021.
H. Liang, I. C.-H. Fung, Z. T. H. Tse, J. Yin, C.-H. Chan, L. E. Pechta, B. J. Smith, R. D. Marquez-Lameda, M. I. Meltzer, K. M. Lubell and K.-W. Fu, “How did Ebola information spread on twitter: broadcasting or viral spreading?,” BMC Public Health, vol. 19, p. 438, 4 2019.
M. Barthel, How the 2016 presidential campaign is being discussed on Reddit, 2017.
T. Surya Gunawan, N. Aleah Jehan Abdullah, M. Kartiwi and E. Ihsanto, “Social Network Analysis using Python Data Mining,” in 2020 8th International Conference on Cyber and IT Service Management (CITSM), 2020.
A. Whiting and D. Williams, ResearchGate, 2013.
A. M. Kaplan and M. Haenlein, “Users of the world, unite! The challenges and opportunities of Social Media,” Business Horizons, vol. 53, p. 59–68, 1 2010.
A. Agarwal, B. Xie, I. Vovsha, O. Rambow and R. Passonneau, “Sentiment analysis of Twitter data,” in Proceedings of the Workshop on Languages in Social Media, USA, 2011.
J. Lee, A. Jatowt and K.-S. Kim, “Discovering underlying sensations of human emotions based on social media,” Journal of the Association for Information Science and Technology, vol. 72, p. 417–432, 2021.
E. Chen, K. Lerman and E. Ferrara, “Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set,” JMIR Public Health and Surveillance, vol. 6, 5 2020.
C. Tan and L. Lee, “All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement,” in Proceedings of the 24th International Conference on World Wide Web, Republic and Canton of Geneva, CHE, 2015.
M. Paulino, R. Dumas-Diniz, S. Brissos, R. Brites, L. Alho, M. R. Simões and C. F. Silva, “COVID-19 in Portugal: exploring the immediate psychological impact on the general population,” Psychology, Health & Medicine, vol. 26, p. 44–55, 1 2021.
R. Molla, How coronavirus took over social media, 2020.
J. Samuel, G. G. M. N. Ali, M. M. Rahman, E. Esawi and Y. Samuel, “COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification,” Information, vol. 11, p. 314, 6 2020.
S. N. Saleh, C. U. Lehmann, S. A. McDonald, M. A. Basit and R. J. Medford, “Understanding public perception of coronavirus disease 2019 (COVID-19) social distancing on Twitter,” Infection Control & Hospital Epidemiology, vol. 42, p. 131–138, 2 2021.
C. Machado, Public attention about COVID-19 on social media: An investigation based on data mining and text analysis |Elsevier Enhanced Reader, 2021.
E. Probierz, A. Galuszka and T. Dzida, “Twitter Text Data from #Covid-19: Analysis of Changes in Time Using Exploratory Sentiment Analysis,” Journal of Physics: Conference Series, vol. 1828, p. 012138, 2 2021.
L. Singh, S. Bansal, L. Bode, C. Budak, G. Chi, K. Kawintiranon, C. Padden, R. Vanarsdall, E. Vraga and Y. Wang, “A first look at COVID-19 information and misinformation sharing on Twitter,” ArXiv, 3 2020.
K. Sharma, S. Seo, C. Meng, S. Rambhatla and Y. Liu, “COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations,” arXiv:2003.12309 [cs], 10 2020.
G. Samuel, S. L. Roberts, A. Fiske, F. Lucivero, S. McLennan, A. Phillips, S. Hayes and S. B. Johnson, “COVID-19 contact tracing apps: UK public perceptions,” Critical Public Health, vol. 0, p. 1–13, 4 2021.
M. Hashemi and M. Hall, “Multi-label classification and knowledge extraction from oncology-related content on online social networks,” Artificial Intelligence Review, vol. 53, p. 5957–5994, 12 2020.
J. C. Lyu and G. K. Luli, “Understanding the Public Discussion About the Centers for Disease Control and Prevention During the COVID-19 Pandemic Using Twitter Data: Text Mining Analysis Study,” Journal of Medical Internet Research, vol. 23, p. e25108, 2 2021.
S. Zhang, W. Pian, F. Ma, Z. Ni and Y. Liu, “Characterizing the COVID-19 Infodemic on Chinese Social Media: Exploratory Study,” JMIR Public Health and Surveillance, vol. 7, p. e26090, 2 2021.
R. Wirth and J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining,” ICECT 2011 – 2011 3rd International Conference on Electronics Computer Technology, p. 11, 2000.
N. Prat, I. Comyn-Wattiau and J. Akoka, “Artifact Evaluation in Information Systems Design-Science Research – a Holistic View,” in PACIS, 2014.
R. Al-Qutaish and K. Al-Sarayreh, “Software Process and Product ISO Standards: A Comprehensive Survey,” European Journal of Scientific Research, vol. 19, p. 289–303, 2 2008.
A. Barata, Primeiro português infetado com covid-19 ficou sem sequelas, 2021.
S. Bird, E. Loper and E. Klein, Natural Language Processing with Python, O’Reilly Media Inc, 2009.
J. Qiang, Y. Li, Y. Yuan, W. Liu and X. Wu, “STTM: A Tool for Short Text Topic Modeling,” arXiv:1808.02215 [cs], 8 2018.
D. M. Blei, “Latent Dirichlet Allocation,” Journal of Machine Learning Research 3, p. 30, 2003.
R. J. d. A. Almeida, rafjaa/LeIA, 2021.
C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014, p. 10, 2014.
R. R. Nunes, Covid-19. Governo anuncia 8,4 milhões para duplicar capacidade de testagem do país, 2020.
Diário de Notícias, Quantos casos de Covid-19 há em cada concelho de Portugal, 2020.
A. Guimarães, Covid-19: Portugal é o país com mais casos por milhão de habitantes? Este é o outro lado da história |TVI24, 2021.
D. Lai, D. Wang, J. Calvano, A. S. Raja and S. He, “Addressing immediate public coronavirus (COVID-19) concerns through social media: Utilizing Reddit’s AMA as a framework for Public Engagement with Science,” PLoS ONE, vol. 15, 2020.