Development of a R package for Classifying Duplicate Gene Retention Mechanisms (CDROM) and Application to Duplicate Genes in Grasses

Open Access
Perry, Brent R
Area of Honors:
Bachelor of Science
Document Type:
Thesis Supervisors:
  • Raquel Assis, Thesis Supervisor
  • Timothy Jegla, Honors Advisor
  • Gene duplication
  • subfunctionalization
  • grasses
  • Oryza sativa
  • Sorghum bicolor
  • Brachypodium distachyon
  • neofunctionalization
  • specialization
  • conservation
Gene duplication is a major source of new genes and is thought to have an important role in genomic evolution. Though there are several proposed mechanisms of long-term retention of duplicate genes, their genome-wide prevalence remains unclear in a majority of species. Assis and Bachtrog (2013) developed a phylogenetic approach for classifying these duplicate gene retention mechanisms on a genome-wide scale. In Chapter 1, we implement their phylogenetic approach as the R package, CDROM, short for Classification of Duplicate gene RetentiOn Mechanisms (Perry and Assis 2016). CDROM is the first tool capable of classifying duplicate gene retention mechanisms on a genome-wide scale, can be applied to a number of species and datasets, runs quickly, and is user-friendly. In Chapter 2, we apply CDROM to duplicate genes in three grass species: Brachypodium distachyon, Oryza sativa, and Sorghum bicolor. Our findings reveal that a variety of mechanisms may retain duplicate genes in grasses, though interestingly, also indicate that subfunctionalization may not play as significant a role as hypothesized. Thus, we have developed a useful tool for studying duplicate gene retention mechanisms in a variety of species, as well as applied it to gain novel insight into the mechanisms retaining duplicate genes in grasses over long evolutionary timescales.