A Comparative Analysis on the Ability of Machine Learning Models to Create a Housing Price Index Using Integrated New York City Housing and School Performance Data
Open Access
Author:
Gault, William
Area of Honors:
Data Sciences
Degree:
Bachelor of Science
Document Type:
Thesis
Thesis Supervisors:
John Yen, Thesis Supervisor John Yen, Thesis Honors Advisor Suhang Wang, Faculty Reader
Keywords:
Data Science Machine Learning Feature Engineering Linear Regression Neural Network
Abstract:
This thesis introduces data from the New York City Department of Finance and New York City Department of Education to compare the traditional methods of creating a Housing Price Index (H.P.I.) with a machine learning-based approach that incorporates the quality of school district each property belongs to, and compare their benefits and limitations. The thesis describes the key components of the process of creating an H.P.I. using machine learning methods: including data integration and feature engineering. The thesis also describes the assumptions made as well as the thought process behind each decision. It also describes the techniques used in integrating the two datasets, and my approach in experimenting and comparing the prediction quality of various machine learning-based models. For each model, I compare the performance of both a base-line model and a model that integrates the information about the performance of elementary school that each property is assigned to decide if the inclusion of school performance data has a positive effect on the ability of different machine learning models to predict home sale prices in New York City.