The Impact of Adversarial Examples on Machine Learning Models

Open Access
- Author:
- Martutartus, Sheryll
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Patrick Drew Mcdaniel, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor - Keywords:
- Machine Learning
Computer Science
Adversarial Machine Learning
Spam Detection - Abstract:
- Spam detectors used across email providers are largely based on machine learning algorithms. While such detectors are used to segregate the spam emails from the real "ham" emails, as with any machine learning model, there is the possibility for misclassification due to model inaccuracy. However, attacks on machine learning models are becoming more prevalent, creating input to "trick" the model and result in purposeful misclassification. In the sense of spam detection, these attacks could be used to have a genuine email sent to the junk folder, or even more detrimental, a spam email bypassing the junk folder and sent to the inbox. In this paper, we explore two types of machine learning methods and their effectiveness in classifying emails. We also experiment with different attacks to generate adversarial samples and trick each of the model. Our findings demonstrate that Naive Bayes models and Neural Networks classify emails with a high accuracy, and we found that an attack generated specifically for Neural Networks, Projected Gradient Decent, has a greater impact on tricking both models versus an attack made for the Bayesian model such as ham-word injection. Lastly, this study interprets what these types of models and transferability of attacks on these models means in application for the security of email use.