BiG-SCAPE CORASON



BiG-SCAPE / CORASON tutorial


This tutorial explains how to download datasets to run BiG-SCAPE and CORASON, and provides examples for how to use both tools.

Input dataset


Example dataset


The example dataset consists of three parts: a set of genomes (includes a single cluster from MIBiG), a set of GenBank files (selected clusters predicted from the set of genomes) and the TauD sequence from a Streptomyces genome (NRRL B-1347).
Download and execute the following script to obtain the files:

$ mkdir ~/bin/example # not required if you already have that
$ curl -q https://raw.githubusercontent.com/nselem/bigscape-corason/master/scripts/data_bigscape_corason.sh > ~/bin/example/data_bigscape_corason.sh
$ chmod a+x ~/bin/example/data_bigscape_corason.sh
$ cd ~/bin/example && ~/bin/example/data_bigscape_corason.sh


Data can also be downloaded manually at: DOI

How to compile your own input dataset


In this tutorial, we will work with the above example data. If you want to use your own gene clusters as input for BiG-SCAPE and the CORASON family mode integrated within the BiG-SCAPE pipeline, you can search for publicly available genomes in antiSMASH-DB and download the desired cluster files in GenBank (.gbk) format. Alternatively, you can perform your own antiSMASH runs on the public web server or on your local system, collect the cluster GBK files and put them together in a folder that you can use as input for BiG-SCAPE. Entries from MIBiG can be added automatically by adding the `--mibig` flag to the end of your BiG-SCAPE command (see below).

BiG-SCAPE example

We will now proceed with the example data. Once data have been downloaded, run the following command at the terminal (you should be now in the `~/bin/example` folder):

$ run_bigscape gbks example_output


If everything goes well, the terminal will show something similar to:

Select class


The _input_ for BiG-SCAPE is the directory _gbks_ that contains GenBank files of sequences of Biosynthetical gene clusters (BGCs) predicted by antiSMASH. The BiG-SCAPE _output_ will be stored in the directory `example_output`.
After BiG-SCAPE has finished successfully, open the `index.html` file located inside the 'example_output' folder with your browser, (e.g. Chrome or Firefox). The file contains an interactive offline webpage that displays the BiG-SCAPE results and allows you to explore them.


Select class


To start exploring, select a class at the top of the site:
Select class

Now, the screen will display a network visualization of BGC families within this class.


Chose family


In this case, the NRPS class contains 10 BGCs organized in one gene cluster family of three members, one family of two members and five singletons.
Now select a family in this network to visualize BGCs sorted and aligned by CORASON.

Corason

This family contains three members.
BiG-SCAPE output can also be imported with Cytoscape
In another example, BiG-SCAPE was employed to calculate BGC families in 103 complete Streptomyces genomes. The outcome of this run can be found here


CORASON example

~/bin/run_corason -q queryFile gbksDirectory referenceBGC -g

Example dataset


The example dataset consists of three parts: a set of genomes (includes a single cluster from MIBiG), a set of GenBank files (selected clusters predicted from the set of genomes) and the TauD sequence from a Streptomyces genome (NRRL B-1347).

CORASON finds variation in the genomic vicinity of a reference cluster. To this end, CORASON can explore either BGCs predicted by antiSMASH or complete genomes. Results of this approaches will be slightly different.

Download and execute the following script to obtain the files:

$ mkdir ~/bin/example # not required if you already have that
$ curl -q https://raw.githubusercontent.com/nselem/bigscape-corason/master/scripts/data_bigscape_corason.sh > ~/bin/example/data_bigscape_corason.sh
$ chmod a+x ~/bin/example/data_bigscape_corason.sh
$ cd ~/bin/example && ~/bin/example/data_bigscape_corason.sh
ls


The content of ~/bin/example after the download must be similar to the next figure.


Corason dyr
The next step to identify variations in the genomic vicinity of </i> tauD that remain similar to the reference BGC JMGX01000001.1.cluster003.gbk is to execute CORASON by running:

~/bin/run_corason TauD.fasta gbks gbks/JMGX01000001.1.cluster003.gbk -g


The output will be in the directory query-output, in this case TauD.fasta-output. Use firefox to see a visualization of results. In this case we can see even fragment BGCs without the NRPS genes.

firefox TauD.fasta-output/Joined.svg



Corason example
To obtain more variations of genomic vicinities, switch the exploration to genomes instead of BGCs. In this case, more BGCs are found and differences are not limited to gene content of the BGC; differences in sequence similarity are also indicated by a color gradient.

~/bin/run_corason TauD.fasta genomes genomes/JOBW01.gbk -g


Corason example