Skip to content

3. Using binnacle

Harihara Subrahmaniam Muralidharan edited this page Jul 2, 2021 · 3 revisions

To estimate the graph scaffold level coverages, you have to call the Estimate_Abundances.py with the following parameters.

usage: Estimate_Abundances.py [-h] [-g ASSEMBLY] [-a COVERAGE] [-bam BAMFILE]
                              [-bed BEDFILE] [-c CONTIGS] -d DIR [-o COORDS]
                              [-w WINDOW_SIZE] [-t THRESHOLD]
                              [-n NEIGHBOR_CUTOFF] [-p POSCUTOFF]
                              [-pre PREFIX]

binnacle: A tool for binning metagenomic datasets using assembly graphs and
scaffolds generated by metacarvel. Estimate_Abundances.py estimates abundance
for scaffolds generated by MetaCarvel. If the coordinates computed by binnacle
is specified then the abundance for each scaffold is estimated based on the
contig abundances and the coordinates. If the coordinates are not specified
then binnacle etimates the abundance from scratch. While calculating all vs
all abundances please specify the coordinates(Coordinates_After_Delinking.txt)
through the "coords" parameter. The abundances can be provided as a bed file,
bam file or a text file describing the per base coverage obtained by running
the genomeCoverageBed program of the bedtools suite.

optional arguments:
  -h, --help            show this help message and exit
  -g ASSEMBLY, --assembly ASSEMBLY
                        Assembly Graph generated by Metacarvel
  -a COVERAGE, --coverage COVERAGE
                        Output generated by running genomecov -d on the bed
                        file generated by MetaCarvel.
  -bam BAMFILE, --bamfile BAMFILE
                        Bam file from aligning reads to contigs
  -bed BEDFILE, --bedfile BEDFILE
                        Bed file from aligning reads to contigs. If bed file
                        is provided please provide a fasta file of the contigs
  -c CONTIGS, --contigs CONTIGS
                        Contigs generated by the assembler, contigs.fasta
  -d DIR, --dir DIR     output directory for results
  -o COORDS, --coords COORDS
                        Coordinate file generated by Binnacle
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        Size of the sliding window for computing test
                        statistic to identify changepoints in coverages
                        (Default=1500)
  -t THRESHOLD, --threshold THRESHOLD
                        Threshold to identify outliers (Default=99)
  -n NEIGHBOR_CUTOFF, --neighbor_cutoff NEIGHBOR_CUTOFF
                        Filter size to identify outliers within (Defualt=100)
  -p POSCUTOFF, --poscutoff POSCUTOFF
                        Position cutoff to consider delinking (Default=100)
  -pre PREFIX, --prefix PREFIX
                        Prefix to be attached to all outputs
  1. The assembly graph generated by MetaCarvel can be found in the output directory specified to MetaCarvel and is titled oriented.gml. To global coordinates and detect change points we used the perbase coverage obtained by mapping the reads of a sample to its contigs.
  2. We use the Coords_After_Delinking.txt to estimate the coverages of scaffolds using the coverages obtained by mapping reads of all other samples. The. coordinate information is passed using, -o COORDS parameter.
  3. Binnacle outputs scaffold level perbase coverage and the summary of coverage information for each scaffold which is a feature used by other binning tools.
  4. To collate the coverage information from multiple samples, we call the script Collate.py
python Collate.py -h                        
usage: Collate.py [-h] -d DIR [-m METHOD] [-k KEEP]

binnacle: A tool for binning metagenomic datasets using assembly graphs and
scaffolds generated by metacarvel.Estimate_Abundances.py estimates abundance
for scaffolds generated by MetaCarvel. The program Collate.py collects the
summary files generated by Estimate_Abundances.py

optional arguments:
  -h, --help            show this help message and exit
  -d DIR, --dir DIR     Output directory that contains the summary files
                        generated by running Estimate_Abundances.py
  -m METHOD, --method METHOD
                        Binning method to format the output to. Presently we
                        support 1. Metabat 2. Maxbin 3. Concoct 4. Binnacle
                        (Default)
  -k KEEP, --keep KEEP  Retain the summary files generated by
                        Estimate_Abundances.py. Defaults to True 

A typical workflow would look like this. To estimate abundacnes of scaffold using the perbase coverage information obtained from mapping reads of sample Si to the contigs of Sample Si,

python Estimate_Abundances.py -g oriented.gml -a Si_Si.txt -c Si.contigs.fasta -d <output-directory>

Running the above command outputs the Coords_After_Delinking.txt file also. To estimate abundacnes of scaffold using the perbase coverage information obtained from mapping reads of sample Si to the contigs of Sample Sj,

python Estimate_Abundances.py -o Coords_After_Delinking.txt -a Sj_Si.txt -c Sj.contigs.fasta -d <output-directory>

After running the above commands to generate abundances for the scaffolds of sample Sj using the reads of all the samples <S1, S2, S3,.., Sn> we run Collate.py.

python Collate.py -m metabat -d <Directory-Containing-the-Scaffold-Abundacnes>

Make sure all the abundance files of a sample is available in the same directory.

Clone this wiki locally