Predicting the Outcome of the Final Sixteen Teams in College Basketball Using Time Series Analysis and Markov Chains
Open Access
- Author:
- Miller, Christopher
- Area of Honors:
- Statistics
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Andrew John Wiesner, Thesis Supervisor
David Hunter, Thesis Honors Advisor - Keywords:
- Statistics
College Basketball
Time-Series - Abstract:
- The culminating tournament for collegiate level basketball brings together a field of sixty-four teams to crown a champion in a spectacle called March Madness. There are many different approaches to ranking and predicting the outcome of the games within the tournament. In this paper, a new approach is taken using a time-series analysis approach to capture the effect of trends throughout the season. The underlying time-series model will be based on a team’s efficiency rating, or put otherwise, a team’s ability to score and defend well against other teams, adjusting for relative strength. Two types of time-series models are considered: a dynamic auto-regressive integrated moving average (ARIMA) model that is fit uniquely to every team, and a dynamic exponential smoothing (ETS) model that is also fit uniquely to every team. These models are used on nine different college basketball seasons from 2011 to 2019 using a probability Markov chain approach to determine winning probabilities for predicting only the first two rounds of the March Madness tournament. Over the nine-year period, either the ARIMA or ETS method predicted the same or more correct games over the first two rounds when compared to a baseline method which predicts the favorite seed for every matchup. The analysis also looks at the effectiveness of predicting upsets. The ARIMA method predicted upsets correctly 37.2% of the time while the ETS method registers at 44.2%. It is recommended that this type of modeling for March Madness predictions would be best used to quantify trending teams. The ETS model slightly outperforms the ARIMA model and both models are most effective when trying to determine closely seeded matchups.