IFI30

This Project

The purpose of this project is to annotate specific genes in the newly sequenced whale shark genome in order to contribute to the scientific study of the whale shark and the immune system.


Background Information
IFI30 is the shorthand notation for Gamma-Inferon-Inducible Lysosomal Thiol Reducatase, and can be found in a cell’s lysosomes and macrophages (“IFI30 Mouse” 2015). The IFI30 protein is especially important for transcriptional targeting. This process of transcriptional targeting utilizes tissue or tumor specific promoters to drive gene expression in cancer cells (Rengasamy 2015). In a study completed at the National University of Singapore, it was found that the ability to promote transcriptional targeting, specifically in glioma (tumors), signifies that the IFI30 gene can be used in in-vivo therapy (Rengasamy 2015). Furthermore, the IFI30 gene can reduce disulfide bonds, while at a low pH, and is expressed in antigen-presenting cells (“Genes and Mapped Phenotypes” 2015). The disulfide bonds support immunoglobulin structure, which can generate antibodies through antigen production (Liu and May 2015). The function of the IFI30 gene is imperative for the normal function of antigen-presenting cells, which aid in immune system function.



ifi30.png
ifi30.png

Figure 1: IFI30 Interacting Proteins (“IFI30 Gene” 2015)




Protein Domains



The only protein domain belonging to the IFI 30 protein is the Gamma interferon inducible lysosomal thiol reductase (GILT) domain. The GILT family includes two of the human GILT sequences, and several eukaryotic putative proteins that are similar to the GILT domain. The two GILT sequences within this domain contain a C-X(2)-C motif contained in other familial sequences. The motif is associated with the main function of the GILT domain, which is that of disulphide bond reduction. The domain is illustrated in the figure below.






external image maBoxJCDU7iHR6pDl4k_tA8bw3sZ0DQ6o-lRrAfJFdhM3LUPPAPMupx4BkWNa0838ApzPOXWTPLJiUy6Ip6XuThingJhldfofSlvwJpz8pLJmGZOn8XX1DsCMR_uqryOxtGkXMY
(Figure 1: Protein Domain of IFI 30, illustrated as 1 GILT superfamily. Image obtained curteousy of http://blast.ncbi.nlm.nih.gov)






Methods
Whale Shark Protein Sequence
The IFI30 protein sequence was found by searching for the sequence ID (ENSP00000384886) the Ensembl database and downloading the sequence from the top hit in the fasta format. The sequence was then used as the query for a BLAST against the predicted whale shark proteome on the galaxy server. The top predicted protein matches, determined by lowest E-value and highest alignment length, were then themselves used as queries for a BLAST against the NCBI human proteome database to check for reciprocity. When no reciprocal matches were found, the human IFI30 sequence was BLASTed against the elephant shark proteome. The best match in the elephant shark proteome was then used as the query in another BLAST against the predicted whale shark proteome. The top predicted proteins in those results were also BLASTed against the NCBI human proteome database as queries to check for reciprocity.


Orthologs

The same human IFI30 sequence used previously was used at the query for BLASTs against various species’ NCBI and galaxy proteome databases to find orthologs to the human protein. The match with the lowest E-value of each BLAST was tabulated and itself used as a query for a BLAST against the NCBI human proteome database to determine reciprocity.


Phylogenetic Tree
The best matches obtained in the previous section were added to a clustalw multiple sequence alignment to create the phylogenetic tree as well as the two whale shark predicted proteins from both BLASTs using the human protein query and elephant shark query with the smallest E-values. Default settings of the clustalw website were used.


Whale Shark Protein Sequence
The human protein sequence was used as the query of the predicted whale shark proteins database. The best three matches are tabulated below in Table 1 and ranked by descending E-value.
Sequence ID
E-Value
Alignment Length
Percentage of Positive Matches
Reciprocal Name
Reciprocal E-value
g37936.t1
9e-05
22
63.64
dermatan-sulfate epimerase precursor
5.6
g37078.t1
6e-04
40
40.00
PREDICTED: cytosolic carboxypeptidase 3 isoform X3 [Homo sapiens]
2e-81
g13114.t1
6e-04
22
63.64
coiled-coil domain-containing protein 71 [Homo sapiens]
2e-19
Table 1. Predicted whale shark protein sequences that matched with human IFI30. Percentage of positive matches refers to the percentage of amino acids that have similar function in sequence, and reciprocal name/E-value refers to the name/E-value of the best match (by smallest E-value) of the human protein returned by doing a BLAST of the NCBI human protein database with the whale shark predicted protein as the query. Since none of the proteins from the BLAST against whale shark proteome returned the original IFI30 human protein or had a particularly low E-value, none of these results can be considered orthologs.


The predicted whale shark proteins returned by the BLAST using the human IFI30 protein returned no significantly low values and no reciprocal best match. As a result, these proteins cannot be considered orthologs to the IFI30 protein, and more research is necessary. The next step would be to use the likely ortholog of the human IFI30 protein in a species more related to the whale shark than the human. In table 2 below, the same BLAST of the predicted whale shark proteome is completed except the likely elephant shark ortholog to human IFI30 is used as the query instead of the human IFI30 protein itself.


Sequence ID
E-Value
Alignment Length
Percentage of Positive Matches
Reciprocal Name
Reciprocal E-value
g46550.t1
4e-05
58
39.66
PREDICTED: E3 ubiquitin-protein ligase TRIM41 isoform X1 [Homo sapiens]
4.5
g27961.t1
7e-05
56
48.21
PREDICTED: ERC protein 2 isoform X3 [Homo sapiens]
4.7
g11533.t1
2e-04
17
64.71
No significant similarity found
N/A
g41681.t1
4e-04
79
44.30
indoleamine 2,3-dioxygenase 1 [Homo sapiens]
5.6
Table 2. Predicted whale shark protein sequences that matched with the elephant shark likely ortholog to human IFI30. Percentage of positive matches refers to the percentage of amino acids that have similar function in sequence, and reciprocal name/E-value refers to the name/E-value of the best match (by smallest E-value) of the human protein returned by doing a BLAST of the NCBI human protein database with the whale shark predicted protein as the query. As with the human IFI30 query, no reciprocal best match or extremely low e-values were seen in the BLAST, indicating that no IFI30 ortholog can be found in this predicted proteome of the whale shark.


As in table 1, no extremely low E-values or reciprocal best matches were found in the whale shark predicted proteome, pointing to the possibility that there is no ortholog of human IFI30 in this proteome.


Orthologs
The IFI30 human protein was investigated to find orthologs in various other species and results are shown below in Table 3. Since reciprocal best matches were found for all organisms included except yeast, it can be said that the IFI30 protein is most likely a protein that has evolved independently in several locations. This would explain why it is found in the plant rice and in the animals investigated despite not being found in yeast, which has a more recent common ancestor with animals than rice does. Interestingly, in almost every BLAST less than ten matches were obtained and only one was remotely similar, indicating that this protein has a very unique sequence with few isoforms or proteins that accomplish similar tasks.
Species
Name
ID
Length
E-Value
Reciprocity
Human
gamma-interferon-inducible lysosomal thiol reductase preproprotein
NP_006323.2
250
N/A
N/A
Mouse
gamma-interferon-inducible lysosomal thiol reductase precursor
NP_075552.2
248
2e-97
Yes
Chicken
PREDICTED: gamma-interferon-inducible-lysosomal thiol reductase isoform X2
XP_418246.3
411
8e-65
Yes
Clawed Frog
interferon, gamma-inducible protein 30 precursor
NP_001017196.1
256
4e-68
Yes
Zebrafish
gamma-interferon-inducible lysosomal thiol reductase precursor
NP_001006057.1
255
9e-60
Yes
Atlantic Cod
N/A
gadMor1_genscan_HE566943.15_1
61
3e-18
Yes
Elephant Shark
N/A
calMil1_genscan_KI635918.205_1
254
3e-66
Yes
Fruit Fly
CG41378
NP_001104369.2
196
1e-22
Yes
Rice
Os03g0295800
NP_001049827.1
265
3e-34
Yes
Yeast
Ecm29p
NP_011833.1
1868
1.3
No
Table 3. The best match (by smallest E-value, query coverage, and percent identity) for each species when a query of human IFI40 is BLASTed against its NCBI or galaxy database. Length refers to the amino acid length of the protein and reciprocity refers to whether or not an NCBI BLAST of the human proteome database with the protein listed as the query returns IFI30 as a best match.


Phylogeny and Phylogenetic Tree
The phylogenetic tree in figure 2 created by all the protein sequences largely corresponds with phylogenetic trees constructed using homologous anatomy and genome sequencing. Interestingly, the proteins of all the species except yeast and the predicted whale shark proteins are very closely related to each other. This seems to indicate that the yeast and whale shark proteins are unrelated and therefore very distant from the otherwise close-knit grouping of the related proteins of other species. This gives further evidence for the idea that, like the yeast proteome, the predicted whale shark proteome has no ortholog to the human IFI30 protein.
external image xMfXlOMRXCtcg4dG-eyqwGMCSAZjUw_PYpxFpRkZFgXFvMzoJRbmFJM-p0E2JTo27kwxBI6TtIyZyqWnOFaxdwwb8BvKckxCcy4w-SmUo6Lph3wNOLIJC8nQqqsEB4FjkBX3yOY
Figure 3. The phylogenetic tree of IFI30 best matches from Table 3 and the top two whale shark sequences from tables 1 and 2. The best matches of each species and the 4 best whale shark predicted proteins were aggregated using Clustalw where the point of divergence indicates the most recent common ancestor and length of the branches represents evolutionary time. The proteins of all of the organisms except the yeast and whale shark cluster closely together, whereas the yeast and predicted whale shark proteins are much farther even from each other than they are to any of the other organisms or the most unrelated of the orthologs. This seems to indicate that those proteins are unrelated to each other and to the human IFI30 protein.


Conclusion


The ortholog and sequence comparison data indicated previously bring the conclusion that this gene is not an ortholog in the whale shark. It would be more accurate to conclude that this specific protein is orthologous among other species, since they achieved reciprocal best matches with human IFI30 and the orthologous proteins are all named, except in the fruit fly and rice, gamma-inducible interferon protein or gamma-interferon-inducible lysosomal thiol reductase. The high e-values, differing structural components, and lack of any conserved domain all indicate the lack of similarities and support the conclusion that these are most likely not orthologous between human and Whale Shark genomes.



References


"Genes and Mapped Phenotypes." National Center for Biotechnology Information. U.S. National Library of Medicine, 5 Apr. 2015. Web. 15 Apr. 2015.


"IFI30 (mouse)." IFI30 (mouse). Cell Signaling Technology, n.d. Web. 10 Apr. 2015.


Liu, H., and K. May. "Result Filters." National Center for Biotechnology Information. U.S. National Library of Medicine, Jan.-Feb. 2012. Web. 15 Apr. 2015.


Rengasamy, Madhumitha. "National University of Singapore ." ScholarBank@NUS: Terms of Use. National University of Singapore, 16 May 2015. Web. 11 Apr. 2015.

"Basic Local Alignment Search Tool." BLAST:. N.p., n.d. Web. 15 Apr. 2015.