Using GDDA BLAST as a statistical method to study the HAD superfamily

Open Access
Smith, Evan James
Area of Honors:
Bachelor of Science
Document Type:
Thesis Supervisors:
  • Randen Patterson, Thesis Supervisor
  • Richard Cyr, Honors Advisor
  • HAD
  • phylogenetic
  • evolution
  • multiple sequence alignment
  • divergence
  • convergence
The Haloacid Dehalogenase (HAD) superfamily contains a diverse group of phosphoryl and carbonyl transfer enzymes. The family spans all superkingdoms of life and is present in many of the cellular compartments catalyzing reactions on very diverse substrates from nucleotides to saccharides to proteins. Despite the exploration of many different biochemical spaces, all of the members of this superfamily have two unique structural similarities: a squiggle and a flap motif located around the active site. Structural modifications called caps around these structural motifs allow more diverse interactions with substrates and increases reaction efficiency. Additionally, they provide a base for further classification. However, classifying, and establishing evolutionary trees for HAD is difficult because there exists a large amount of sequence divergence within the superfamily, even between families with similar cap structures. A research group led by Burroughs performed a phylogenetic analysis identifying 5 distinct lineages, and providing evidence for convergent structures. Their evolutionary relationships were mainly inferred from structural and functional similarities not statistical algorithms. Most phylogenetic studies are performed using statistical tools like multiple sequence alignments (MSA). Unfortunately multiple sequence alignment algorithms have failed in the past when presented with highly divergent families like HAD. However, using a novel statistical method, GDDA BLAST, which was created with specific modifications to identify sequence similarities between related but diverged sequences we sought to better resolve the evolutionary relationships in the superfamily. As a null hypothesis we used GDDA BLAST assuming a single lineage for HAD. Evolutionary contradictions were found within the lineage such as lack of monophyly in the groups and inconsistent speciation. This evidence led us to further investigate an evolutionary tree with multiple lineages like that suggested by the Burroughs group. Using a hierarchical clustering method we grouped the families into 4 distinct lineages before continuing with a neighbor joining algorithm. The resulting phylogram contained few evolutionary inconsistencies and those it did could be explained through lateral transfer or other evolutionary mechanisms. It has become clear that GDDA BLAST can be an effective tool when used with clustering methods to generate robust phylogenetic, evolutionary relationships between divergent families with convergent structural elements.