Studies of Linear and Deep-Neural Network (DNN) Models for Automated Short Answer Grading

Open Access
- Author:
- Tomar, Yajur
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Rui Zhang, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor - Keywords:
- ASAG
NLP
Neural Networks
Linear Models
Non-linear Models
Regular Expression Based Features
SemEval
AutoP
Feature Engineering - Abstract:
- The Automated Short Answer Grading (ASAG) task is one of the foci at the Penn State NLP Lab. It is focused on finding automated methods to grade and provide feedback for students’ short answer questions, based upon rubrics and guidelines set by the instructor or course designer and applied by instructors, teaching assistants or graders. Grading hundreds of short answer questions, especially in larger college classes, can be difficult to complete in a consistent and timely fashion. To help tackle this issue, the Penn State NLP lab looks into machine learning models as a method for assisting with this task. However, one of the main issues with ASAG is the lack of data, which is important especially in the context of implementing deep learning models. Deep learning models have been showing great performance in the NLP domain, however their effectiveness relies on an abundance of data. To help resolve this problem, the NLP lab has collaborated with education researchers in STEM topics to jointly create labeled data that can be used for this task. The models developed on this data consist primarily of neural network architectures, but the baselines they are compared with include both linear models, which require feature engineering, and non-linear neural networks. This paper will explore the different linear and nonlinear machine learning algorithms used as baselines for the lab’s work. One important focus was to determine whether one aspect of the feature engineering used in one of the available baselines could be re-implemented, specifically, automatic generation of regular expression features. An attempt was made to replicate other investigators’ results using these engineered features with random forest models applied to a KAGGLE dataset. In addition, the thesis describes work completed to create a new ASAG dataset for middle school physics, and results on this data and on a benchmark dataset of a baseline that use LSTM networks to generate features for a logistic regression. The effort involved in engineering features manually versus learning them using LSTM clearly demonstrates the tradeoff between access to sufficient data to use neural networks versus the effort involved in feature engineering for linear models.