Statistical Analysis of Spatial Networking Characteristics in Relation to COVID-19 Transmission Risks in U.S. Counties
Open Access
- Author:
- Yang, Sihan
- Area of Honors:
- Industrial Engineering
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Hui Yang, Thesis Supervisor
Sanjay Joshi, Thesis Honors Advisor - Keywords:
- COVID-19
disease
lasso regression
modeling
K means clustering - Abstract:
- As the Coronavirus disease (COVID-19) is spreading all over the continental United States, more and more people’s life is negatively affected by the pandemic. Research on the influential factors of COVID-19 transmissions is essential to control the virus spread process. Although several factors including socioeconomics, geography, and mobility have been considered in previous studies, little has been done to investigate how the characteristics of roads and transportation profiles are connected to the COVID-19 progression in different spatial regions. The movement of individuals’ activities tends to follow connected transportation networks that consist of airports, bridges, railroads, and highways. Hence, this thesis focuses on how county-level transportation profiles and road characteristics impact the spread of the COVID-19 in the U.S. Specifically, county-level transportation profiles for 80 counties in the U.S. (e.g., the transportation infrastructure, total numbers of commute workers) and road characteristics are considered. First, the Box-Cox transformation was used to standardize all the variables. Second, the lasso-penalized regression model was established on the dataset to select sensitive variables for K-means clustering. With an adjusted R squared of 37.87%, 13 significant predictors were selected from the lasso-penalized regression model. The number of bridges, the route miles of passenger railroad and rail transit, and streets per node average have positive correlations with cumulative COVID-19 cases per 100,000 people. The number of business establishments and street density have negative correlation with cumulative COVID-19 cases per 100k people. Then, 80 counties in the U.S. were clustered into four severity levels of COVID-19 transmissions based on the selected features from the lasso model. The K-means approach gives a score of 434.3399 with k = 4, namely 80 county samples are categorized into 4 clusterings. The number of business establishments, edge length total, and street density are shown to be more significant than other predictors.