Curating the LOGIFALLA Dataset with Mephisto and MTurk
Open Access
Author:
Lee, Reuben
Area of Honors:
Computer Science
Degree:
Bachelor of Science
Document Type:
Thesis
Thesis Supervisors:
Kenneth Huang, Thesis Supervisor David Koslicki, Thesis Honors Advisor
Keywords:
Mephisto Mechanical Turk (Mis)Information Game Crowdsourcing Dataset Logical Fallacies
Abstract:
Numerous datasets have been created to aid AI systems in detecting falsehoods, fabricated news, offensive language, and provocative posts in digital discussions [6]. However, despite the significance of these efforts, the identification and interpretation of the fundamental logical fallacies that undermine the rational soundness of such content have yet to be explored [6]. A significant obstacle to this exploration is the absence of standardized benchmark datasets, as the process of annotating logical fallacies poses a considerable challenge. In this paper we provide a system of techniques for curating the first public benchmark dataset for common logical fallacies in online conversation, which we will call: LOGIFALLA. The system of techniques presented in this paper can be used to curate not only the LOGIFALLA dataset but can be replicated for future dataset curation by the NLP community. This system comprises of Meta’s crowdsourcing tool, Mephisto, Amazon’s crowd sourcing platform, Mechanical Turk, and a simulated social media platform tool, the (Mis)Information Game, all of which are publicly available for research use [4].