Topology and Data Analysis

Open Access
Zhou, Hangyu
Area of Honors:
Bachelor of Science
Document Type:
Thesis Supervisors:
  • John Roe, Thesis Supervisor
  • Sergei Tabachnikov, Honors Advisor
  • hoomology
  • topological data analysis
  • persistent homology
  • algebraic topology
Topological data analysis is a recently developed technique to analyze datasets in Euclidean space. This new technique enables us to analyze datasets which are high-dimensional, incomplete and noisy. The motivation of topological data analysis is to study the shape of the data. To see the shape from a discrete set of data points, many algorithms require a choice of proximity. However, this parameter is usually hard to decide and we need some other information to determine what proximity to use. The main insight of persistent homology is that we should be looking at all proximities altogether, but it is hard to transform this large amount of information into an understandable and easy-to-present form. In topological data analysis, the idea of \textit{homology} solves this problem. Briefly, we assume that structures that persist over a long range of proximities are real structures of the dataset, while structures which only persist for a short range are considered to be noise. In this thesis, we will discuss the mathematical tools that are necessary to understand topological data analysis, introduce how a particular algorithm works, and apply this technique to analyze some real-world data.