Cross-Species Prediction of Transcription Factor Binding
Open Access
Author:
Agarwala, Vandana
Area of Honors:
Computer Science
Degree:
Bachelor of Science
Document Type:
Thesis
Thesis Supervisors:
Shaun Mahony, Thesis Supervisor John Joseph Hannan, Thesis Honors Advisor
Keywords:
Gene Regulation Deep Learning Transfer Learning Domain Adaptation Computational Biology Bioinformatics Machine Learning
Abstract:
Transfer learning, the application of knowledge gained in one machine learning task to a new and related task, represents an attractive approach to studying gene regulation across different species. Here, we apply transfer learning to study the transcription factor (TF) binding motif patterns of four specific transcription factors in up to seven different species. We expect that TF binding preferences should generalize across different species and thus a model trained on one species’ genome should roughly be able to predict binding to another species’ genome. However, there are some species-specific genomic features, such as repeat elements, which prevent trained models from generalizing perfectly across different species. To account for this, we propose a domain adaptive model architecture which discourages learning of species-specific genomic sequence features. Our results demonstrate that prediction is feasible on species-agnostic genomic features when such an architecture is used to account for domain shifts, i.e. differences in underlying genomic background. Our results also suggest that analysis may be more informative if evolutionary distance is taken into account in prediction.