How to Use this Database

"scientia vincere tenebras"

- conquering darkness by science

Pre_GI browse

Users are able to search the database by accession numbers and strain descriptions to view genomic islands of interest. Strains are hyperlinked to a web-page that visualizes all detected genomic islands of the selected replicon in a SVG graph. Each visualized genomic island contains information on genomic location, compositional similarity, sequence similarity and OU statistical parameters displayed in the form of a table below the graph. All genes contained in the genomic island may be investigated by means of the hyperlinked start location of the genomic island in the table. This will include location in the genomic island, BLASTP hits for the specific gene, gene annotation and the ability to search the database for genes in other genomic islands with a similar annotation. Each genomic island genbank text file is available as a seperate web-page and designated as ’GI text’ that displays the genomic island genbank text file in the browser. Statistical oligonucleotide usage parameters, for each genomic island individually, can be compared to the rest of the database to obtain genomic islands identified with similar statistical oligonucleotide usage parameters. Oligonucleotide usage pattern similarity hits for a chosen genomic island above 75% is presented as 'Neighbours'. The page also allows viewing results of individual genomic islands and the associated cluster/subcuster with which they share compositional similarity results between 75% and 85%. BLAST results for the high scoring hits obtained from all-vs-all genomic island sequence similarity searches can be accessed and visualized through the added options of BLASTN and BLASTP.

Pre_GI proposed donor-recipient flux

Proposed donor-recipient movement prediction for oligonucleotide usage pattern hits and cluster/subcluster is available for genomic islands to aid in the detection of possible fluxes of mobile genetic elements between bacterial species. Donor-recipient indicators for homologous genomic islands are presented in the 'OUP Neighbours', 'OUP MCL Cluster Neighbours' and 'OUP Sub Cluster Neighbours' pages. Direction of movement is indicated by arrows with green arrows depicting movement from the subject to the query and blue arrows movement from query to subject. Red two-headed arrows signal that direction of movement is ambiguous. These pages employ distance values (D-values) as a measure of dissimilarity of genomic island to the host. D-values are calculated as 100% - percent genomic island oligonucleotide usage pattern similarity to host. This approach can further be illustrated by an additional freely available SeqWord project utility, LingvoCom. The combination of LingvoCom and the Pre_GI database allows users to identify genomic islands and compare them with previously predicted genomic islands housed in the database.

Pre_GI gene annotation browse

The database may be browsed against by means of gene annotation. All annotations in the database are available and hyperlinked to retrieve all similar annotations and the genomic islands that they occur in.

Pre_GI locational search

Novel predicted genomic islands may be queried against the database through sequence and compositional similarity searches. This enables users to compare newly predicted genomic islands to existing genomic islands to obtain and investigate links, movement and similarity between genomic islands. The ability to compare entire genomic islands and not only genes enhances the functionality and applicability of the database to identify ontology and origin of genomic islands. Existing genomes may be queried through coordinates to recognize if a region overlaps with genomic islands present in the Pre_GI database. This may be used to indicate if genomic islands predicted by other methods overlap with SWGIS genomic islands predictions or to test if a gene of interest is contained in a predicted genomic island.

Pre_GI sequence similarity

Sequence similarity is performed with BLASTN and a user specified or the default BLASTN e-value cut-off. Results are tabulated to indicate the highest scoring hits together with subject source accessions and source descriptions. The subject genomic islands are hyperlinked to allow users to inspect the hit genomic island contained in the Pre_GI database and scrutinize BLASTN visualizations for high scoring hits.

Pre_GI compositional similarity

Novel genomic island nucleotide sequence may be compared against the database entries by oligonucleotide usage pattern similarity searches. A genomic island nucleotide sequence is first compared against the 420 cluster/subcluster genomic island representatives in search for shared compositional similarity above 75%. The novel genomic island is then compared to all members of the cluster/subcluster with which similarity was found through the cluster/subcluster representative genomic island. The results are tabulated from highest to lowest compositional similarity with subject hits hyperlinked to allow for further inspection. The subject description and clusters/subclusters are displayed together with the percentage similarity. Clusters/subclusters are furthermore hyperlinked to view all genomic islands contained in the specific grouping.

Pre_GI genbank similarity

The ability for users to upload and compare genomic island genbank files produced by seqword genomic island sniffer is of great practical use. Sequence and compositional similarity results are available for up to 7 genomic island genbank files in concert. Results include the source accessions, source descriptions, genomic island locations and genomic island oligonucleotide usage parameters together with sequence and compositional similarity hits to the database as described above. Proposed donor-recipient flux is estimated for user submitted genomic islands. This enables the user to perform multiple comparisons across the database with all of the applications simultaneously. Oligonucleotide usage pattern similarity hits includes similarity to query and the detection of flow between the genomic islands to indicate movement of the genomic islands. The clustering/subclustering identified to be of importance to the query with the aid of precomputed representatives is easily accessible by users. BLASTN hits provide the option to visualize high scoring BLASTN results.