-
Notifications
You must be signed in to change notification settings - Fork 2
3. Using binnacle
To estimate the graph scaffold level coverages, you have to call the Estimate_Abundances.py
with the following parameters.
usage: Estimate_Abundances.py [-h] [-g ASSEMBLY] [-a COVERAGE] [-bam BAMFILE]
[-bed BEDFILE] [-c CONTIGS] -d DIR [-o COORDS]
[-w WINDOW_SIZE] [-t THRESHOLD]
[-n NEIGHBOR_CUTOFF] [-p POSCUTOFF]
[-pre PREFIX]
binnacle: A tool for binning metagenomic datasets using assembly graphs and
scaffolds generated by metacarvel. Estimate_Abundances.py estimates abundance
for scaffolds generated by MetaCarvel. If the coordinates computed by binnacle
is specified then the abundance for each scaffold is estimated based on the
contig abundances and the coordinates. If the coordinates are not specified
then binnacle etimates the abundance from scratch. While calculating all vs
all abundances please specify the coordinates(Coordinates_After_Delinking.txt)
through the "coords" parameter. The abundances can be provided as a bed file,
bam file or a text file describing the per base coverage obtained by running
the genomeCoverageBed program of the bedtools suite.
optional arguments:
-h, --help show this help message and exit
-g ASSEMBLY, --assembly ASSEMBLY
Assembly Graph generated by Metacarvel
-a COVERAGE, --coverage COVERAGE
Output generated by running genomecov -d on the bed
file generated by MetaCarvel.
-bam BAMFILE, --bamfile BAMFILE
Bam file from aligning reads to contigs
-bed BEDFILE, --bedfile BEDFILE
Bed file from aligning reads to contigs. If bed file
is provided please provide a fasta file of the contigs
-c CONTIGS, --contigs CONTIGS
Contigs generated by the assembler, contigs.fasta
-d DIR, --dir DIR output directory for results
-o COORDS, --coords COORDS
Coordinate file generated by Binnacle
-w WINDOW_SIZE, --window_size WINDOW_SIZE
Size of the sliding window for computing test
statistic to identify changepoints in coverages
(Default=1500)
-t THRESHOLD, --threshold THRESHOLD
Threshold to identify outliers (Default=99)
-n NEIGHBOR_CUTOFF, --neighbor_cutoff NEIGHBOR_CUTOFF
Filter size to identify outliers within (Defualt=100)
-p POSCUTOFF, --poscutoff POSCUTOFF
Position cutoff to consider delinking (Default=100)
-pre PREFIX, --prefix PREFIX
Prefix to be attached to all outputs
- The assembly graph generated by MetaCarvel can be found in the output directory specified to MetaCarvel and is titled oriented.gml. To global coordinates and detect change points we used the perbase coverage obtained by mapping the reads of a sample to its contigs.
- We use the
Coords_After_Delinking.txt
to estimate the coverages of scaffolds using the coverages obtained by mapping reads of all other samples. The. coordinate information is passed using,-o COORDS
parameter. - Binnacle outputs scaffold level perbase coverage and the summary of coverage information for each scaffold which is a feature used by other binning tools.
- To collate the coverage information from multiple samples, we call the script
Collate.py
python Collate.py -h
usage: Collate.py [-h] -d DIR [-m METHOD] [-k KEEP]
binnacle: A tool for binning metagenomic datasets using assembly graphs and
scaffolds generated by metacarvel.Estimate_Abundances.py estimates abundance
for scaffolds generated by MetaCarvel. The program Collate.py collects the
summary files generated by Estimate_Abundances.py
optional arguments:
-h, --help show this help message and exit
-d DIR, --dir DIR Output directory that contains the summary files
generated by running Estimate_Abundances.py
-m METHOD, --method METHOD
Binning method to format the output to. Presently we
support 1. Metabat 2. Maxbin 3. Concoct 4. Binnacle
(Default)
-k KEEP, --keep KEEP Retain the summary files generated by
Estimate_Abundances.py. Defaults to True
A typical workflow would look like this. To estimate abundacnes of scaffold using the perbase coverage information obtained from mapping reads of sample Si to the contigs of Sample Si,
python Estimate_Abundances.py -g oriented.gml -a Si_Si.txt -c Si.contigs.fasta -d <output-directory>
Running the above command outputs the Coords_After_Delinking.txt
file also.
To estimate abundacnes of scaffold using the perbase coverage information obtained from mapping reads of sample Si to the contigs of Sample Sj,
python Estimate_Abundances.py -o Coords_After_Delinking.txt -a Sj_Si.txt -c Sj.contigs.fasta -d <output-directory>
After running the above commands to generate abundances for the scaffolds of sample Sj using the reads of all the samples <S1, S2, S3,.., Sn> we run Collate.py
.
python Collate.py -m metabat -d <Directory-Containing-the-Scaffold-Abundacnes>
Make sure all the abundance files of a sample is available in the same directory.