RAG2

This Project
This is a project undertaken by Emory students where we ran FASTA protein sequences from the human genome against the Whale Shark genome compiled by the Georgia Aquarium to look for orthologs and then ran the same gene against other organisms to find orthologs.
Background Information

RAG2 encodes a protein that is involved in the initiation of V(D)J recombination during B and T cell development. This protein forms a complex with the product of the adjacent recombination activating gene 1, and this complex can form double-strand breaks by cleaving DNA at conserved recombination signal sequences. The recombination activating gene 1 component is thought to contain most of the catalytic activity, while the N-terminal of the recombination activating gene 2 component is thought to form a six-bladed propeller in the active core that serves as a binding scaffold for the tight association of the complex with DNA. A C-terminal plant homeodomain finger-like motif in this protein is necessary for interactions with chromatin components, specifically with histone H3 that is trimethylated at lysine 4.

Mutation: Mutations in RAG2 cause Omenn syndrome, which is a severe form of immunodeficiency which also displays symptoms of an autoimmune illness.

Methods

Whale Shark Predicted Orthologs
The human protein RAG2-201 ENST00000618712was used as a query sequence against the whale shark predicted proteins database through Galaxy and the Georgia Aquarium. The top hits from this search were recored. The top hits from the first blast searches were run as reciprocal query sequences against the human protein database through NCBI.
Other Predicted Orthologs
The human RAG2-201 sequence was blasted against mouse, hippopotamus, beagle dog, fruit flies, true yeasts, and redeye piranha NCBI databases to identify orthologs in other species using the NCBI BLAST tool.
Phylogenetic Tree
The hit with the lowest e-Value from each species was used to create a phylogenetic tree relating whale sharks, humans, and the other species based on the RAG2-201 protein. ClustalW2 was used to generate these data.

Analyzing the whale shark genome
Very little similarity was found between this protein and proteins present in the Whale Shark Genome.
Sequence ID
Length
%Identical Matches
e-Value
g43408.t1
39
38.46
4e-05
g47950.t1
38
34.21
1e-05
g38774.t1
39
43.59
2e-04
Table 1. The top hits from the whale shark genome. These are obviously not very good matches and indicate the absence of orthologs due to the low percent matches and high e-values.

However, when a reciprocal search was run using the top hit from the Whale Shark Genome, a large amount of matches were observed. This was probably due to the fact that the hit used as the search query has very little similarity with the initial RAG2 gene, instead bears a basic similarity to a lot of other protein coding genes. The results are as follows.
Accession
e-Values
pfam00651
1.73e-22
smart00225
5.70e-20
pfam077007
6.93e-17
Table 2. The reciprocal hits from the matched portion of the whale shark proteins returned very low e-values and were good matches.

Protein Domain

wrpsb.png
Fig 1. The RAG2 protein coding gene belongs to the BTB superfamily

Orthologs

As is evident from the following table, most of the mammalian organisms share protein sequences with extremely low e-values and high percentage identities, while most non-mammalian organisms have very high e-values and relatively low % identities, which suggest that the gene may be a mammalian one.
Query
Database
Accession
e-Value
%identity
Human
Mouse
NP_033046.1
0.0
88%
Human
Hippopotamus
AAG38712.1
2e-134
91%
Human
Beagle Dog
XP_005631162.1
0.0
89%
Human
Fruit Flies
NP_524614.2
0.007
38%
Human
True Yeast
XP_004178549.1
9e-04
38%
Human
Redeye Piranha
AGW00428.1
0.56
64%

Phylogeny
tree_upgma (1).png
Fig 2. The phylogenic tree shows most homology between mammals, and least between mammals compared to simpler organisms like the fruit fly and yeast.

Conclusion
Because we had such high e-values and such low percentage matches for the human protein against the whale shark protein, we concluded that there were no orthologs found in the whale shark. After blasting it against other organisms, we concluded that it seems to be a protein that is coded by a gene that deals with more complex immune systems, as is obvious from the phylogenic tree and also deductible as we know that it is an immune system related gene. It is also important to mention that we got a lot of very close reciprocal hits from the portion of the whale shark protein that was our best match, but we can not accept these as the matching sequence was very short therefore my be a very common part of most genomes which is why it returns so many matches in humans.

References
  • Genes and mapped phenotypes. (2015, February 28). Retrieved March 30, 2015, from http://www.ncbi.nlm.nih.gov/gene/5897
  • "Genes and Mapped Phenotypes." National Center for Biotechnology Information. U.S. National Library of Medicine, n.d. Web. 14 Apr. 2015.
  • van Gent, D. C. , Ramsden, D. A. & Gellert, M. The RAG1 and RAG2 proteins establish the 12/23 rule in V(D)J recombination. Cell 85, 107– 113 (1996).
  • Gellert, M. V(D)J recombination: RAG proteins, repair factors, and regulation. Annu. Rev. Biochem. 71, 101–132 (2002)
  • RAG2 Gene. (n.d.). Retrieved March 30, 2015, from http://www.genecards.org/cgi-bin/carddisp.pl?gene=RAG2&search=813972eb598862afecc62b1c46635718
  • Elkin, S. K. et al. A PHD finger motif in the C terminus of RAG2 modulates recombination activity. J. Biol. Chem. 280, 28701–28710 (2005)