panacus
is a tool for calculating statistics for GFA files. It supports GFA files with P
and
W
lines, but requires that the graph is blunt
, i.e., nodes do not overlap and consequently, each link (L
) points from the end of one segment
(S
) to the start of another.
panacus
supports the following calculations:
- coverage histogram
- pangenome growth statistics
- path-/group-resolved coverage table
Histogram listing the number of features (nodes, edges, ...) that are visited by a certain number of paths.
Describes how many features (nodes, edges, ...) one would expect on average if the graph was built from 1...n haplotypes.
To limit the amount of features that are part of the calculation (e.g. for visualizing the core genome) pairs of the coverage/quorum parameters can be used:
coverage
: include only features in the calculation that are visited by at least that many paths (can be used e.g. to filter out private nodes, that are part of only 1 haplotype)quorum
: fraction of haplotypes that must share a feature after the haplotype is added to the graph to include it in the output (e.g. a quorum of1
means only features that are shared by100%
of the haplotypes ("core genome"))
panacus
is written in RUST and requires a working RUST build system (version >= 1.74.1) for installation. See here for more details.
panacus
provides a Python script for visualizing the calculated counting statistics. It requires Python>=3.6 and the following Python libraries:
- matplotlib
- numpy
- pandas
- scikit-learn
- scipy
- seaborn
Make sure you have conda/mamba installed!
mamba install -c conda-forge -c bioconda panacus
wget --no-check-certificate -c https://github.com/marschall-lab/panacus/releases/download/0.2.5/panacus-0.2.5_x86_64-unknown-linux-musl.tar.gz
tar -xzvf panacus-0.2.5_x86_64-unknown-linux-musl.tar.gz
# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn
# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f panacus-0.2.5_x86_64-unknown-linux-musl/bin)":$PATH
# you are ready to go!
panacus --help
wget --no-check-certificate -c https://github.com/marschall-lab/panacus/releases/download/0.2.5/panacus-0.2.5_aarch64-apple-darwin.tar.gz
tar -xzvf panacus-0.2.5_aarch64-apple-darwin.tar.gz
# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn
# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f panacus-0.2.5_aarch64-apple-darwin/bin)":$PATH
# you are ready to go!
panacus --help
git clone [email protected]:marschall-lab/panacus.git
cd panacus
cargo build --release
mkdir bin
ln -s ../target/release/panacus bin/
ln -s ../scripts/panacus-visualize.py bin/panacus-visualize
# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn
# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f bin)":$PATH
# you are ready to go!
panacus --help
$ panacus
Calculate count statistics for pangenomic data
Usage: panacus <COMMAND>
Commands:
info Return general graph and paths info
histgrowth Run hist and growth. Return the growth curve
hist Calculate coverage histogram
growth Calculate growth curve from coverage histogram
ordered-histgrowth Calculate growth curve based on group file order (if order is unspecified, use path order in GFA)
table Compute coverage table for count type
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Generate a simple growth plot from a GFA file:
RUST_LOG=info panacus histgrowth -t6 -q 0.1,0.5,1 -S <INPUT_GFA> > output.tsv
panacus-visualize -e output.tsv > output.pdf
Examples can be found in the examples directory.
Parmigiani, L., Garrison, E., Stoye, J., Marschall, T. & Doerr, D. Panacus: fast and exact pangenome growth and core size estimation. Bionformatics, https://doi.org/10.1093/bioinformatics/btae720 (2024).