This Project
This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.

Background Information: The TICAM1/TRIF gene is involved in the innate immunity against invading pathogens. The full name of this gene is toll-like receptor adaptor molecule 1. This gene mediates the rather delayed cascade of two TLR-associated signaling cascades, while the other one is dependent upon a MyD88 adapter.This gene is involved in immune system response, anti-virus, and inflammatory response. TRIF is mostly active in the spleen and this gene is often regulated when MyD88 is deficient in the liver. This gene is located on chromosome 19.
TICAM1.jpeg
Figure 1: The physical location of TICAM1 on Chromosome 19



Figure 1: The physical location of TICAM1 on Chromosome 19
11128121_10205678388190467_1963945125_n.jpg
Figure 2: Diagram of immune system response mediated by TRIF (TICAM1) protein.
http://lsresearch.thomsonreuters.com/static/maps/559_map.png

Methods:
Whale Shark predicted homologues:
The TICAM1/TRIF human protein sequence (ENSP00000248244) was used as query in a Blast against the predicted whale shark protein database using the whaleshark.georgiaaquarium.org Galaxy server. Top predicted protein hits were then used as querries (using the full predicted sequence not only the aligned sequence) in protein BLASTs against the NCBI human protein database. The whale shark predicted protein database was also searched using the elephant shark predicted TICAM1 protein sequence as query.

Predicted orthologs
TICAM1/TRIF predicted orthologs were identified in species other than whale sharks using the NCBI Blast server and ClustalW program. Protein BLASTs were performed using single species protein databases. The human TICAM1 protein (ENSP00000248244) was used as query sequence in these searches with default settings.

Phylogenetic tree
The hit with the lowest E-value for each non-whale shark species search (using the human protein as query) along with the top 4 whale shark BLAST hits were used to create a multiple sequence alignment and phylogenetic tree. ClustalW with default settings was used to create the alignment and tree.

Protein Domains:
TICAM1/TRIF belongs to the TLR_2 super family. This family is known as the classical TLRs as they perform the basic functions of Toll-like receptors. Proteins belonging to this domain are characterized for binding to molecules of microbial origin.
Screen Shot 2015-04-13 at 8.23.24 PM.png
Figure 3: NCBI BLASTed protein family: TLR_2 superfamily


Searching for TICAM1/TRIF in the Whale Shark:
Species
Sequence
e-value
%ID
Alignment length
Human
ENSP00000248244
0.0
100%
712
Whale Shark
Top Hit
g25712.t1
5e-05
34.29%
35
Whale Shark
Second Hit
g32441.t1
1e-04
35.21%
71
Whale Shark
Third Hit
g410001.t1
3e-04
43.59%
39
Whale Shark
Fourth Hit
g43334.t1
3e-04
34.78%
46
Whale Shark
Fifth Hit
g34803.t1
3e-04
24.00%
225
Cow
XP005208986.1
0.0
65%
559
Mouse
NP778154.1
0.0
53%
482
Zebra Fish
NP001038224.1
2e-28
34%
114
Arabidopsis
NP199897.2
1.4
24%
65
Elephant Shark
calMil1_genscan_KI635908.93_1
1e-16
31.79%
173
Fruit Fly
N/A*
N/A*
N/A*
N/A*
Clawed Frog
XP004911105.1
2e-33
36%
113
*Denotes that NCBI BLAST returned no significant results

Table 1: Significant matches from the top 5 hits in the whale shark genome and top result from each of the human, cow, arabidopsis, elephant shark, fruit fly, clawed frog, zebra fish, and mouse genomes.

When searching for the TICAM1-TRIF gene sequence in the whale shark genome, first the Ensembl program was used to obtain the FASTA sequence for the TICAM1-TRIF sequence in the human genome. This sequence was then uploaded to the Galaxy database (Whale Shark Galaxy genome sequence provided by whaleshark.georgiaaquarium.org) and applied as a query sequence for the whale shark database. Next the NCBI BLAST of the gene sequence against the cow, mouse, zebra fish, and all other species' genomes was conducted. From these programs the top result from each genome respectively was recorded and the top five hits from the whale shark genome were recorded to be tested for othology.

Reciprocal search:

Whale Shark Sequence
Reciprocal Search Result ID
Protein Description
e-value
Percent Identity
g25712.t1
NP891549.1
TIR domain-containing adapter molecule 1
5.3
34%
g32441.t1
NP003861.1
ras GTPase-activating-like protein IQGAP1
0.030
40%
g410001.t1
XP011527375
PREDICTED: protein FAM83D isoform X1
6e-13
59%
g43334.t1
N/A*
N/A*
N/A*
N/A*
g34803.t1
NP849195.2
neuronal PAS domain-containing protein 4
8e-17
35%
*Denotes that NCBI BLAST returned no significant results

Table 2: Reciprocal search results using top whale shark hits for ENSP00000248244 against human genome database.

Results from the reciprocal search show that none of the whale shark hits came back as matches in the human genome, which attests to the dissimilarity of the two genomes. The insignificant results suggest a low chance of being homologous and either orthologous or paralogous.

Orthologs:
*CLUSTALW DATA GOES HERE*

Figure 4: ClustalW alignment scores for the 12 tested sequences, 5 whale shark sequences, and the most likely orthologous sequence form the human, mouse, cow, zebra fish, arabidopsis, elephant shark, clawed frog, and fruit fly genomes respectively.

Orthology of the protein sequence in the experimental species was determined based on a number of factors including e-value, alignment length, and percent identity. The data found showed the mouse gene to have the most similarity to the human sequence and strongly suggests homology with an e-value of 0.0 and 53% identity. Meanwhile the other genomes have far less significant values which suggests a need for further research in the cow and zebra fish sequence. The whale shark sequences are the least similar.

Figure 4 confirms a strong change of homology to the mouse and cow sequences while the remaining scores show little significance (data obtained through the ClustalW multiple sequence alignment program at default settings). The scores do however show that the most possibly related sequence to the query in the whale shark genome was the fourth top hit, which could aid in narrowing the scope of further research and in what factors contribute to significance (since the sequence was suggested to be only the fourth most likely related based on the chosen experimental factors).


Phylogeny:

*PHYLOGENETIC TREE GOES HERE*

Figure 5: Phylogenic tree generated with respect to the alignment scores of the ClustalW.

Phylogeny was determined by the ClustalW program and is presented in Figure 5, which symbolizes the mathematical significance of the alignment scores from Figure 4. Again it was observed to be the mouse sequence that was most related, but the zebra fish, with what seemed to be a less significant e-value, was more closely related to the human sequence than the rabbit. Finally, the whale shark sequences showed the least significant relation, with g43334.t1 distinguishing itself as the most similar of the five which could suggest that some methods of categorizing sequence similarity exist that aren't being considered within the bounds of the experiment.

Conclusion:
The most related genes belonged to the mouse and zebra fish genomes respectively, whose significance far outweighed even the best matches for the whale shark genome. The e-values of the top five results also suggested dissimilarity. These results, coupled with the alignment scores of the ClustalW, phylogenic tree, and NCBI BLAST results have lead to the conclusion that homology to the whale shark genome is very unlikely for the TICAM1-TRIF gene sequence. None of the genomes returned results for the same TICAM1/TRIF protein and, with exceptions for only the cow and mouse genomes, this insignificance coupled with low e-values and various other experimental factors, have led to the conclusion that the TICAM1/TRIF gene is novel in our ancestry and likely developed long after our ancestral genomes began to diverge.

References:
1."TIR Domain-containing Adapter Molecule 1." TICAM1. Web. 14 Apr. 2015. <http://www.uniprot.org/uniprot/Q8IUC6>.
2."Direct Binding of TRAF2 and TRAF6 to TICAM-1/TRIF Adaptor Participates in Activation of the Toll-like Receptor 3/4 Pathway." ScienceDirect 47.6 (2010): 1283-291. Molecular Immunology. ScienceDirect. Web. 13 Apr. 2015. <http://www.sciencedirect.com/science/article/pii/S016158900900858X>.
3."TICAM1 Gene." - GeneCards. Web. 14 Apr. 2015. <http://www.genecards.org/cgi-bin/carddisp.pl?gene=TICAM1>.

4.Funami, Kenji, and Miwa Sasai. "Homo-oligomerization Is Essential for Toll/Interleukin-1 Receptor Domain-containing Adaptor Molecule-1-mediated NF-κB and Interferon Regulatory Factor-3 Activation*." US National Library of Medicine 283.26 (2008): 18283–18291. US National Library of Medicine. Web. 13 Apr. 2015. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440629/>.
5.Tatematsu, M. "A Molecular Mechanism for Toll-IL-1 Receptor Domain-containing Adaptor Molecule-1-mediated IRF-3 Activation." Pub Med 285.26 (2010): 20128-36. US National Library of Medicine. Web. 13 Apr. 2015. <http://www.ncbi.nlm.nih.gov/pubmed/20418377>.
6. Takaki, Hiromi. "Oligomerized TICAM-1 (TRIF) in the Cytoplasm Recruits Nuclear BS69 to Enhance NF-κB Activation and Type I IFN Induction." European Journal of Immunology 39.12 (2009): 3469-476. European Journal of Immunology. Web. 13 Apr. 2015. <http://onlinelibrary.wiley.com/doi/10.1002/eji.200939878/abstract>.