Predicting 30-Day ICU Readmission Rates Using Machine Learning on Electronic Health Records

Open Access
- Author:
- Farooq, Sarah
- Area of Honors:
- Data Sciences
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Fenglong Ma, Thesis Supervisor
John Yen, Thesis Honors Advisor - Keywords:
- machine learning
classification
EHR
data
electronic health records
hospital
ICU
readmission
class imbalance
missing values
imputation
rebalancing
hospital readmission
patient
medical data
XG Boost
MICE data imputation
health
health care
model
supervised learning
model performance - Abstract:
- This thesis explores the prediction of 30-day hospital readmission for ICU patients using machine learning models on electronic health records (EHR). Hospital readmission is a common occurrence that results in adverse effects on patient health, such as worsened health conditions and financial burdens due to healthcare bills. This event also presents financial challenges for hospitals, as they are exhausting medical resources, tarnishing their reputation due to repetitive patient visits, and facing penalties and losing reimbursements from sponsoring organizations because of unplanned readmissions. Using the MIMIC-III dataset, machine learning models were developed to predict the likelihood of readmission. Due to the high volume and diverse nature of medical data, it often contains immense amounts of missing data and class imbalance problems. As a result, this research’s focus shifted towards investigating the impact of missing data imputation and class rebalancing techniques on the performance of the machine learning models. The results of this thesis showed that certain imputation techniques and rebalancing methods were more effective in improving model performance and predicative capabilities. More specifically, the XG Boost classifier using MICE data imputation and random under-sampling for rebalancing most accurately predicted whether an ICU patient would be readmitted 30 days after their original discharge date. These findings and techniques can be applicable to other datasets in the medical domain, providing insights on how to improve the predictive accuracy of machine learning models in healthcare.