Unveiling Gender Disparities in STEM Success: A Logistic Regression Analysis of Penn State Students

Open Access
- Author:
- Kelly, Katherine
- Area of Honors:
- Mathematics
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Laura E Cruz, Thesis Supervisor
Aissa Wade, Thesis Honors Advisor - Keywords:
- logistic regression
women in STEM
undergraduate research
gender - Abstract:
- The topic of women in STEM-related fields has dominated many conversations about representation in the workplace. Some researchers have turned to investigating the gender breakdown in undergraduate STEM majors to see if the gender differential starts in college, where many students select their career path. I wanted to investigate whether there were significant differences between those who succeed in STEM fields at the university level, where students typically decide on their future career paths. I was curious to know whether gender alone was a valid predictor of success. In this paper, I conducted a few logistic regression models based on gender and ethnicity in order to predict the success of women in entrance to STEM major classes at Penn State University Park. Real-life data from Penn State Undergraduate Education was used to make these models. I completed the variable selection process, compared the models’ performances and validity, and demonstrated if and how the models could be used to predict the success of an undergraduate STEM major based on these demographic factors. For each entrance to major class, as well as the data set overall, four logistic regression models were created: Gender predicting Success, Race/Ethnicity predicting Success, Gender and Race/Ethnicity predicting Success, and the interaction between Gender and Race/Ethnicity predicting Success. Each model was compared by their McFadden R2 and AIC values. The best model for each data set was selected, and their predictive performances were evaluated using ROC curves and corresponding AUC values. Finally, I used the model to try to predict success on the test data and calculated each model’s accuracy.