An Enhancer-Based Deep Learning Approach for Studying Viral Regulation

Open Access
- Author:
- Cooper, Rachel
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Shaun Mahony, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor - Keywords:
- Deep Learning
Bioinformatics
Gene Regulation
Enhancers
Virus
HSV-1 - Abstract:
- Although a variety of experimental techniques have now been developed to identify and functionally validate enhancer regions in the human genome, we have yet to discover any reliable underlying ‘enhancer code’ for isolating these regulatory elements on a whole-genome scale in different cell types. As the active enhancer regions in a given cell type play a central role in driving cell-type-specific gene expression, developing an approach to uncover the features in the sequences underlying these regions could provide key insights into the gene regulatory behaviors they exhibit. The prediction capacities of deep neural networks for sequence-based learning problems based on the regulatory regions of the genome have been explored in a variety of different settings in computational biology, and these deep learning frameworks have shown to be a particularly appealing avenue for the prediction of enhancer regions in the genome. Although many of these neural networks have now been created for enhancer prediction, the potential applications for these trained neural network models remain largely unexplored. In this thesis work, we propose an enhancer-based deep learning application for predicting regulatory signatures on the sequences of DNA viruses that replicate in the nucleus of their human host cells. Viruses are significantly influenced by a variety of host cellular factors in the nucleus and rely on shaping the environment of their host to create conditions conducive to their own replication. As the active enhancers in a given cell type greatly contribute to this regulatory environment existing in the nucleus of the cell, we hypothesize that there is a correspondence between these active enhancer regions in the cell and the regions of the DNA virus genomes that associate with the host cellular factors they co-opt for their own use. We present a deep learning framework that can be used to predict the putative active enhancer regions of a given cell type in the human genome based on the characteristic chromatin features occurring at these regions. These models are then applied to the HSV-1 virus genome in an exploratory analysis to predict regulatory signatures on this DNA sequence to evaluate the potential of such an application. As viruses continue to be a leading contributor to human disease worldwide, with effective antiviral therapeutics unavailable for many of these pathogens, it is clear that developing approaches for the continued research of these complex pathogen-host interactions are necessary for gaining valuable insights into these processes.