"exempli gratia"

- for the sake of example

Let us demonstrate the Pre_GI database with the aid of an example. A novel complete genome sequence of Hyphomicrobium nitrativorans [CP006912] was obtained from the NCBI. This strain has the capacity to denitrify using methanol as a carbon source - Martineau C, Villeneuve C, Mauffrey F, Villemur R. 2014. Complete genome sequence of Hyphomicrobium nitrativorans strain NL23, a denitrifying bacterium isolated from biofilm of a methanol-fed denitrification system treating seawater at the Montreal Biodome. Genome Announc. 2(1):e01165-13. doi:10.1128/genomeA.01165-13.

Hold on! This looks like an organism with potential. Let us investigate.

Browse Pre_GI Database

Let us now browse the database for this specific strain. The browse pre_gi database will aid us in this. We are able to browse with the help of accession number or description. Luckily the browse function uses regular expression to search as I keep on making spelling mistakes, e.g.. The browser results indicate that this specific strain is not present in the database, yet there is another Hyphomicrobium replicon present in the database associated with water. We will use Hyphomicrobium denitrificans ATCC 51888 chromosome, complete [NC_014313] as a starting point for our research.

This bacterial replicon contains 3 identified genomic islands. A graphical representation of the replicon, Seqword parameters and identified genomic islands is given. Below the graph we have a hyperlink to the NCBI Hyphomicrobium denitrificans ATCC 51888 chromosome, complete [NC_014313] genbank page. Host lineage and general information on the host organism is included. The table below the graph contains all the information relating to each specific genomic island. This includes positional, parameter and relational information as well as access to text files.

Genomic island # 2 will be used as an example.

The cds's contained in the genomic island is available through the hyperlink on the GI start position. This page displays information on cds position, cds description, ontology and BLASTp hits. It is further possible to download the genomic island and cds sequences. The cds description hyperlink will display all cds's contained in the database with a similar description. QuickGO ontology provides gene ontology information with the aid of the EBI's QuickGO search (A fast browser for Gene Ontology terms and annotations). BLASTp hits with a threshold of e-05 for all cds's are displayed to ascertain cds sequence similarity.

The genomic island genbank file is available for both download (Download GI.gbk) and browser viewing (Genbank Text)

Seqword Genomic Island Sniffer produced parameter values for GRV_RV (generalized relative variance) / (relative variance), D (pattern deviation) and PS (pattern skew) are shown. These values are hyperlinked to search the database for all residing Genomic Islands with similar parameter values within a certain range.

Results for OUP_Neighbours demonstrates all Genomic Islands in the database with which our query (Genomic island # 2) shares compositional similarity of above 75%. Proposed subject Query movement is predicted by the difference in pattern deviation between the query and the subject. It should be noted that intermediate hosts between movements may not be excluded.
Arrows and colours indicate the direction of movement.

Green arrows represents a likely movement from subject to the query.

Blue arrows indicates movement from the query to the subject.

Red arrows ←→ displays uncertainty in regards to the direction of movement.

The cluster and sub_cluster Genomic island # 2 was found in can be found by means of the hyperlinked cluster/sub_cluster identifiers, e.g. 1_1. The direction of movement holds true as indicated above.

Sequence similarity by means of BLASTn and an e-value cut-off e-05 obtained by using Genomic island # 2 as the query against the entire database indicates high scoring hits for our example. BLAST hits have the added utility of graphical BLAST visualization as illustrated by means of Genomic island # 1 BLASTn hit with Ralstonia eutropha JMP134 chromosome 2, complete sequence Genomic island # 4 NC_007348:2115152.

We now have a general idea of genomic islands predicted in Hyphomicrobium strains currently housed in the database. We shall thus proceed by identifying genomic islands in the newly sequenced Hyphomicrobium nitrativorans [CP006912] and then relate them to genomic islands housed in the database.

Sounds exciting. Let's get started!

Genomic islands prediction

The SeqWord Genomic Island Sniffer (SWGIS) standalone program was used to facilitate an analysis. This automated computational tool allows for the identification of horizontally transferred genomic elements in bacterial and plasmid DNA and is freely available for download at SeqWord Project. After download and extraction the genbank file was inserted in the SWSniffer_2013.01.25/input folder and the python script executed. command line Linux example is given in the link. The script does all the work and after 660.97 seconds a complete genome of 3653837 bp was analysed and the results available in the output folder. But wait! There was a permission error when using the blastn option. We first need to change the permissions of files located in the bin directory to allow execution. Changing the permissions allows the Use Blastn option to be invoked. Use BLASTn. If set on, the program uses blastn algorithm and a small database of 16S rRNA sequences to check whether the selected genomic fragments contain rrn clusters. This option is available only for the scenarios "MGE" and "Ribosomal RNA". In the first scenario the predicted MGE is rejected if it contains rrn; in the second scenario a genomic fragment is selected only if it contains rrn. No we have 8 candidate islands, but only 7 are deemed true genomic islands. The results include all 7 genomic islands each in an uniquely named genbank file e.g. CP006912_1.gbk, a fasta file with sequences and a graphical representation in a svg format.

Let us begin with a BLASTn against the database. The fasta file generated by swgis in the SWSniffer_2013.01.25/output folder will aid us. We simply copy and paste the nucleotide sequence of >CP006912:1|Hyphomicrobium nitrativorans NL23, complete genome. [88247-112719] a.k.a. genomic island # 1 in the BLASTn box and select an e-value limit. Click on BLASTn and wait for our results. A quick and easy starting point to identify sequence similarity against the entire database.

The next move in our onslaught for knowledge will be by compositional similarity. Again copy and paste is our friend for analysis on genomic island # 2, the island formerly known as >CP006912:2|Hyphomicrobium nitrativorans NL23, complete genome. [318281-345034]. Due to the computational intensity this may take longer than BLASTn but it is well worth the wait. After making that cup of coffee our results are ready.

Now it is time for the big guns. We will load all 7 predicted Hyphomicrobium nitrativorans [CP006912] genomic islands in concert. Sequence and compositional similarity comparisons will be done on all loaded genomic islands but will take some time. Patience is a virtue and will be rewarded. After another cup of coffe our results for all 7 genomic islands are ready for inspection.

Go forth and conquer the islands!