Statistical Analysis of Pancreatic Cancer Circulating Tumor Cell Data

Open Access
- Author:
- Gearhart, Megan Elizabeth
- Area of Honors:
- Statistics
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Naomi S Altman, Thesis Supervisor
John Fricks, Thesis Honors Advisor
John Fricks, Faculty Reader - Keywords:
- Statistics
Clustering
Circulating Tumor Cells
Cancer - Abstract:
- Circulating tumor cells are one possible cause of the systemic spread of cancer throughout the body. To study these cells, single cell RNA-seq pancreatic cancer cell data was obtained by cancer researcher Dr. Gary Clawson at Hershey Medical Center. To find similarities in the expression patterns of genes in cells of this type, three different statistical analyses were done on single cell data from four patients. The cells were clustered using hierarchical clustering with Spearman’s Correlation as the distance metric to identify commonalities among the cells from all four different patients. This clustering yielded a dendogram with one large mixed cluster from all of the patients, and a surprising lack of clustering of many cells from patient two. Another analysis was then done on the same cells using a list of 23 biologically important genes provided by Dr. Clawson and identified from the pancreatic cancer literature. This yielded a much more mixed cluster, which shows commonalities between the cells of different patients. These 23 genes also had high values when they were looked at as a proportion of the total reads in each cell, accounting for as much as 12% of the reads in some cells. Fourteen highly expressing, low variance genes were identified from the data, most of which are known to code proteins. One of these genes was also on the list of 23 biologically relevant genes identified by Dr. Clawson. Due to the small number of patients, the highly unequal number of cells from each patient, and the poor quality of the RNA-seq data, the results were inconclusive but promising. As more data are collected, there is potential to find biologically important results using these methods.