Using COVID 19 Healthcare Misinformation to Develop a System to Detect Future Fake News
Open Access
Author:
Denenberg, Allison
Area of Honors:
Data Sciences
Degree:
Bachelor of Science
Document Type:
Thesis
Thesis Supervisors:
Suhang Wang, Thesis Supervisor Marc Aaron Friedenberg, Thesis Honors Advisor
Keywords:
Data Science Machine Learning Misinformation Fake News COVID-19
Abstract:
The purpose of this experimental study is to create a machine learning algorithm that can accurately classify tweets as fake or real news. For this experiment, I utilized CoAID (Covid-19 healthcare misinformation dataset). CoAID is a diverse COVID-19 healthcare misinformation dataset that includes fake news on websites and social platforms, along with users’ social engagement from these news articles. Using this dataset enabled the gathering of tweets using Twitter’s API. This dataset was processed and used to train a machine learning algorithm to automatically detect whether the tweets were real or fake news. The prevalence of misinformation posing as authentic news on social media platforms, especially Twitter, is a critical global issue. The goal of this study is to conduct further research in regards to detecting fake news on social media; specifically a machine learning model that can be trained to properly detect misinformation on social media. Due to incomplete features in the dataset, the model created for the user and social context attributes did not ultimately end up producing conclusive results. However, this research can serve as a foundation for further experiments which will minimize the amount of fake content people are being exposed to and therefore reduce harmful effects within years to come.