Neural Networks ENCODE Domain Adaptation Transposable Elements Repetitive Regions
Abstract:
Transposable elements and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, these elements are difficult to characterize due to the creation of reads that map to multiple genomic locations.
Allo is an approach designed to accurately allocate multi-mapped reads within repetitive genomic regions by integrating probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks. In applying Allo to 200 ENCODE ChIP-seq datasets, we identified previously unknown interactions between transcription factors and repetitive element families.
Subsequently, we explore an approach to overcome the performance gap in predicting transcription factor binding across mappable regions and unmappable regions. To account for this domain shift, a domain-adaptive neural network which incorporates a Gradient Reversal Layer was trained to discourage learning of features unique to mappable and unmappable regions.