Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: marschall-lab/panacus
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.2
Choose a base ref
...
head repository: marschall-lab/panacus
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on May 19, 2023

  1. ignore releases

    Dany Doerr committed May 19, 2023
    Copy the full SHA
    3082468 View commit details
  2. ignore DS_Store

    Dany Doerr committed May 19, 2023
    Copy the full SHA
    5cd7e42 View commit details

Commits on May 22, 2023

  1. Update README.md

    danydoerr authored May 22, 2023
    Copy the full SHA
    279461e View commit details

Commits on May 23, 2023

  1. improve formatting

    Dany Doerr committed May 23, 2023
    Copy the full SHA
    f35c8f6 View commit details
  2. fix bug in ordered histgraph and improve interface

    Dany Doerr committed May 23, 2023
    Copy the full SHA
    dfcf684 View commit details
  3. fix bugs when estimating heaps parameters

    Dany Doerr committed May 23, 2023
    Copy the full SHA
    cea7120 View commit details
  4. fixed condition

    Dany Doerr committed May 23, 2023
    Copy the full SHA
    c7134ab View commit details

Commits on May 30, 2023

  1. implemented new storage-efficient data structure for storing group-ba…

    …sed table
    Dany Doerr committed May 30, 2023
    Copy the full SHA
    2587e26 View commit details

Commits on Jun 2, 2023

  1. Copy the full SHA
    836cf48 View commit details
  2. change r vector to usize (because it points to intervals in c/v vecto…

    …rs, which can become *really* big)
    Daniel Doerr committed Jun 2, 2023
    Copy the full SHA
    f265037 View commit details
  3. improve code regarding matplotlib warning

    Daniel Doerr committed Jun 2, 2023
    Copy the full SHA
    80d88bc View commit details

Commits on Jun 5, 2023

  1. added separate order option, so that one can use different levels for…

    … order and subset
    Dany Doerr committed Jun 5, 2023
    Copy the full SHA
    727dd2e View commit details
  2. parameterize group size

    Dany Doerr committed Jun 5, 2023
    Copy the full SHA
    6420309 View commit details

Commits on Jun 6, 2023

  1. fixed bug in subsetting with coords, added convenience option for gro…

    …upby-haplotype and groupby-sample
    Daniel Doerr committed Jun 6, 2023
    Copy the full SHA
    375e55b View commit details

Commits on Jun 7, 2023

  1. fix index issue in ordered-histgrowth and table

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    7b8cfa1 View commit details
  2. more fixes to the calculation of ordered-histgrowth

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    3345f27 View commit details
  3. added sync impl for wrap/64

    Daniel Doerr committed Jun 7, 2023
    Copy the full SHA
    81c179f View commit details
  4. fix index issue in edge table output

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    73c7e80 View commit details
  5. added panacus logo

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    6975436 View commit details
  6. added panacus logo

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    03c2de8 View commit details
  7. fix typo

    Dany Doerr committed Jun 7, 2023
    Copy the full SHA
    4c34dfd View commit details

Commits on Jun 8, 2023

  1. fixing several bugs in subsetting bp with coordinates

    Daniel Doerr committed Jun 8, 2023
    Copy the full SHA
    a4aef74 View commit details
  2. Copy the full SHA
    4e7312e View commit details

Commits on Jun 9, 2023

  1. fix bug in checking for grouping settings

    Daniel Doerr committed Jun 9, 2023
    Copy the full SHA
    5f50eb4 View commit details
  2. exclude exclusion paths in hist/growth output

    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    610810f View commit details
  3. Copy the full SHA
    db2d38b View commit details
  4. fix inverted exclusion statement

    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    0902ecd View commit details
  5. added hprc cactus plot

    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    5916275 View commit details
  6. Copy the full SHA
    0632dea View commit details
  7. fix bug in part that flags excluded nodes

    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    25fed8a View commit details
  8. bump up version number, merge README from github, fix issues in the R…

    …EADME
    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    5494384 View commit details
  9. bump up version number, merge README from github, fix issues in the R…

    …EADME
    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    8a6817b View commit details
  10. fix typo

    Dany Doerr committed Jun 9, 2023
    Copy the full SHA
    d03733b View commit details

Commits on Jun 13, 2023

  1. fixed issue with repeated edges

    Dany Doerr committed Jun 13, 2023
    Copy the full SHA
    e982c52 View commit details
  2. cargo fmt

    Dany Doerr committed Jun 13, 2023
    Copy the full SHA
    dca5b54 View commit details
  3. make sure uncovered edge count is correct even in presence of duplica…

    …ted edges
    Dany Doerr committed Jun 13, 2023
    Copy the full SHA
    fb93f4d View commit details

Commits on Jun 15, 2023

  1. Copy the full SHA
    734e554 View commit details

Commits on Jun 16, 2023

  1. Copy the full SHA
    1b32827 View commit details

Commits on Jun 17, 2023

  1. Merge pull request #6 from AndreaGuarracino/check_both_start_and_end

    check both start and end for `coords`
    danydoerr authored Jun 17, 2023
    Copy the full SHA
    e01ae82 View commit details

Commits on Jun 18, 2023

  1. Fixes issue described in #6, this time for real

    Dany Doerr committed Jun 18, 2023
    Copy the full SHA
    62cb218 View commit details
  2. Copy the full SHA
    38046d0 View commit details

Commits on Aug 19, 2023

  1. made it possible to compute stats for all count types at once and out…

    …put both hist and growth at once
    Dany Doerr committed Aug 19, 2023
    Copy the full SHA
    7a13d04 View commit details

Commits on Aug 20, 2023

  1. ensure that output has *always* 4 header lines

    Dany Doerr committed Aug 20, 2023
    Copy the full SHA
    d7a22c1 View commit details
  2. adjusted panacus-visualize to new table format

    Dany Doerr committed Aug 20, 2023
    Copy the full SHA
    a716f01 View commit details
  3. new implementation of reading in hist tables that supports multiple h…

    …istograms
    Dany Doerr committed Aug 20, 2023
    Copy the full SHA
    984e085 View commit details

Commits on Aug 21, 2023

  1. no alpha curve when estimate_growth disabled

    Dany Doerr committed Aug 21, 2023
    Copy the full SHA
    e822e47 View commit details
  2. Copy the full SHA
    9262a86 View commit details

Commits on Aug 23, 2023

  1. implemented html output for panacus

    Dany Doerr committed Aug 23, 2023
    Copy the full SHA
    2edad1e View commit details
  2. minor notation change

    Dany Doerr committed Aug 23, 2023
    Copy the full SHA
    7b69e7b View commit details
  3. added missing html files, minified some

    Dany Doerr committed Aug 23, 2023
    Copy the full SHA
    cf1b08d View commit details
Showing with 11,402 additions and 2,938 deletions.
  1. +1 −1 .github/workflows/rust_build.yml
  2. +15 −0 .gitignore
  3. +22 −0 .pre-commit-config.yaml
  4. +30 −8 Cargo.toml
  5. +82 −50 README.md
  6. +54 −0 benches/panacus_benchmark.rs
  7. +10 −0 build.rs
  8. +282 −0 docs/chr22.hprc-v1.0-pggb.histgrowth.html
  9. BIN docs/chr22.hprc-v1.0-pggb.histgrowth.node.png
  10. BIN docs/chr22.hprc-v1.0-pggb.report.growth.disabled.highlight.png
  11. BIN docs/chr22.hprc-v1.0-pggb.report.histogram.logscale.highlight.png
  12. BIN docs/chr22.hprc-v1.1-mc-grch38.histgrowth.node.png
  13. BIN docs/ecoli50.gfa.histgrowth.png
  14. BIN docs/hprc-v1.0-mc-grch38.ordered-histgrowth.bp.png
  15. BIN docs/panacus-illustration.png
  16. +6 −0 etc/bootstrap.bundle.min.js
  17. +10 −0 etc/bootstrap.custom.scss
  18. +6 −0 etc/bootstrap.min.css
  19. +19 −0 etc/chart.js
  20. +1 −0 etc/color-modes.min.js
  21. +121 −0 etc/custom.css
  22. +154 −0 etc/hook_after.js
  23. +1 −0 etc/hook_after.min.js
  24. +61 −0 etc/lib.js
  25. +1 −0 etc/lib.min.js
  26. +12 −0 etc/make_custom_bootstrap.sh
  27. BIN etc/panacus-illustration-small.png
  28. +58 −0 etc/report_template.html
  29. +41 −0 etc/symbols.svg
  30. +46 −0 examples/html_report.md
  31. +42 −0 examples/ordered_pangenome_growth_statistics.md
  32. +25 −0 examples/pangenome_growth_ecoli.md
  33. +33 −0 examples/pangenome_growth_minigraph_cactus.md
  34. +29 −0 examples/pangenome_growth_pggb.md
  35. +3 −0 hbs/analysis_section.hbs
  36. +28 −0 hbs/analysis_tab.hbs
  37. +10 −0 hbs/bar.hbs
  38. +72 −0 hbs/report.hbs
  39. +5 −0 hbs/report_content.hbs
  40. +18 −0 hbs/table.hbs
  41. +46 −0 hbs/tree.hbs
  42. +55 −0 package.sh
  43. +158 −92 scripts/panacus-visualize.py
  44. +5 −0 set-version.sh
  45. +0 −622 src/abacus.rs
  46. +40 −0 src/analyses.rs
  47. +256 −0 src/analyses/growth.rs
  48. +160 −0 src/analyses/hist.rs
  49. +581 −0 src/analyses/info.rs
  50. +223 −0 src/analyses/ordered_histgrowth.rs
  51. +73 −0 src/analyses/table.rs
  52. +220 −0 src/analysis_parameter.rs
  53. +0 −582 src/cli.rs
  54. +7 −0 src/commands.rs
  55. +37 −0 src/commands/growth.rs
  56. +55 −0 src/commands/hist.rs
  57. +75 −0 src/commands/histgrowth.rs
  58. +45 −0 src/commands/info.rs
  59. +67 −0 src/commands/ordered_histgrowth.rs
  60. +57 −0 src/commands/report.rs
  61. +60 −0 src/commands/table.rs
  62. +0 −388 src/graph.rs
  63. +358 −0 src/graph_broker.rs
  64. +1,765 −0 src/graph_broker/abacus.rs
  65. +624 −0 src/graph_broker/graph.rs
  66. +400 −0 src/graph_broker/hist.rs
  67. +1,316 −0 src/graph_broker/util.rs
  68. +0 −264 src/hist.rs
  69. +357 −0 src/html_report.rs
  70. +977 −879 src/io.rs
  71. +1,060 −0 src/lib.rs
  72. +3 −18 src/main.rs
  73. +287 −34 src/util.rs
  74. +16 −0 test/README.md
  75. +4 −0 test/bed_chrM/exclusion.bed3
  76. +4 −0 test/bed_chrM/inclusion.bed1
  77. +7 −0 test/bed_chrM/inclusion.bed3
  78. +1 −0 test/bed_chrM/inclusion_chm13.bed1
  79. +3 −0 test/bed_chrM/inclusion_sub.bed1
  80. +25 −0 test/cdbg.gfa
  81. +364 −0 test/chrM_test.gfa
  82. +127 −0 test/integrated_test.R
  83. +5 −0 test/test_groups.txt
  84. +34 −0 tests/growth.rs
  85. +32 −0 tests/hist.rs
  86. +31 −0 tests/histgrowth.rs
  87. +50 −0 tests/info.rs
  88. +34 −0 tests/ordered_histgrowth.rs
2 changes: 1 addition & 1 deletion .github/workflows/rust_build.yml
Original file line number Diff line number Diff line change
@@ -17,4 +17,4 @@ jobs:
- uses: actions-rs/cargo@v1
with:
command: build
args: --release
args: --release
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,19 @@
**/.result/
**/*.swo
**/*.swp
**/.DS_Store
**/*.tar.gz
Cargo.lock
binary_release/
target/
etc/bootstrap.custom.css
etc/bootstrap.custom.css.map
etc/css-dist/
etc/node_modules/
etc/package-lock.json
etc/package.json
.pre-commit-config.yaml
**.old
**/*.yaml
/*.svg
/*.data
22 changes: 22 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0 # this is optional, use `pre-commit autoupdate` to get the latest rev!
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/backplane/pre-commit-rust-hooks
rev: v1.1.0
hooks:
- id: fmt
- id: check
- id: test
- repo: local
hooks:
- id: set-version
name: set-version
entry: bash ./set-version.sh
language: system
files: 'Cargo.toml'
pass_filenames: false
38 changes: 30 additions & 8 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,24 +1,46 @@
[package]
name = "panacus"
version = "0.2.0"
version = "0.3.0"
edition = "2018"
rust-version= "1.60"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { version = "4.0", features = [ "derive" ] }
itertools = "0.10.3"
quick-csv = "0.1"
rand = "0.8.4"
rayon = "1.5.3"
base64 = "0.21"
clap = { version = "4.4.1", features = ["derive", "wrap_help", "cargo"] }
flate2 = { version = "1.0.17", features = ["zlib-ng"], default-features = false }
handlebars = "4.3"
itertools = "0.11"
once_cell = "1.18"
quick-csv = "0.1.6"
rand = "0.8"
rayon = "1.7"
regex = "1"
rustc-hash = "1"
strum = "0.24"
strum_macros= "0.24"
strum = "0.25"
strum_macros= "0.25"
time = { version = "0.3", features = ["macros", "formatting"] }

# Logging and error management
anyhow = "1"
env_logger = "0.10"
log = "0.4"
thiserror = "1"
thousands = "0.2.0"
serde_yaml = "0.9.21"
serde = "1.0.214"
memchr = "2.6.2"

[dev-dependencies]
tempfile = "3.13"
assert_cmd = "2.0.8"
predicates = "2.1.5"
criterion = { version = "0.5", features = ["html_reports"] }

[profile.release]
debug = 1

[[bench]]
name = "panacus_benchmark"
harness = false
132 changes: 82 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
[![Rust Build](https://github.com/marschall-lab/panacus/actions/workflows/rust_build.yml/badge.svg)](https://github.com/marschall-lab/panacus/actions/workflows/rust_build.yml)
[![Rust Build](https://github.com/marschall-lab/panacus/actions/workflows/rust_build.yml/badge.svg)](https://github.com/marschall-lab/panacus/actions/workflows/rust_build.yml) [![Anaconda-Server Badge](https://anaconda.org/bioconda/panacus/badges/version.svg)](https://conda.anaconda.org/bioconda) [![Anaconda-Server Badge](https://anaconda.org/bioconda/panacus/badges/platforms.svg)](https://anaconda.org/bioconda/panacus) [![Anaconda-Server Badge](https://anaconda.org/bioconda/panacus/badges/license.svg)](https://anaconda.org/bioconda/panacus)

# A Counting Tool for Pangenome Graphs

`panacus` is a tool for computing counting statistics of [GFA](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md) files. It supports `P` and
![panacus is a counting tool for pangenome graphs](docs/panacus-illustration.png?raw=true "panacus is a counting tool for pangenome graphs")

`panacus` is a tool for calculating statistics for [GFA](https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md) files. It supports GFA files with `P` and
`W` lines, but requires that the graph is `blunt`, i.e., nodes do not overlap and consequently, each link (`L`) points from the end of one segment
(`S`) to the start of another.

@@ -12,91 +14,121 @@
- pangenome growth statistics
- path-/group-resolved coverage table

## Dependencies
### Coverage Histogram
Histogram listing the number of features (nodes, edges, ...) that are visited by a certain number of paths.

### Pangenome Growth statistics
Describes how many features (nodes, edges, ...) one would expect on average if the graph was built from
1...n haplotypes.

`panacus` is written in [RUST](https://www.rust-lang.org/) and requires a working RUST build system for installation. See [here](https://www.rust-lang.org/tools/install) for more details.
To limit the amount of features that are part of the calculation (e.g. for visualizing the core genome) pairs of the coverage/quorum parameters can be used:

- clap
- itertools
- quick-csv
- rand
- rayon
- regex
- rustc-hash
- strum, strum_macros
- `coverage`: include only features in the calculation that are visited by at least that many paths (can be used e.g. to filter out private nodes, that are part of only 1 haplotype)
- `quorum`: fraction of haplotypes that must share a feature after the haplotype is added to the graph to include it in the output (e.g. a quorum of `1` means only features that are shared by `100%` of the haplotypes ("core genome"))

`panacus` provides a Python script for visualizing the calculated counting statistics and requires the following Python libraries:
## Installation
`panacus` is written in [RUST](https://www.rust-lang.org/) and requires a working RUST build system (version >= 1.74.1) for installation. See [here](https://www.rust-lang.org/tools/install) for more details.

`panacus` provides a Python script for visualizing the calculated counting statistics. It requires Python>=3.6 and the following Python libraries:
- matplotlib
- numpy
- pandas
- scikit-learn
- scipy
- seaborn

## Get panacus
### From bioconda channel

Make sure you have [conda](https://conda.io)/[mamba](https://anaconda.org/conda-forge/mamba) installed!

```shell
git clone git@github.com:marschall-lab/panacus.git
mamba install -c conda-forge -c bioconda panacus
```

## Build
### From binary release
#### Linux x86\_64
```shell
wget --no-check-certificate -c https://github.com/marschall-lab/panacus/releases/download/0.3.0/panacus-0.3.0_x86_64-unknown-linux-musl.tar.gz
tar -xzvf panacus-0.3.0_x86_64-unknown-linux-musl.tar.gz

# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn

# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f panacus-0.3.0_x86_64-unknown-linux-musl/bin)":$PATH

# you are ready to go!
panacus --help
```

#### Mac OSX arm64
```shell
wget --no-check-certificate -c https://github.com/marschall-lab/panacus/releases/download/0.3.0/panacus-0.3.0_aarch64-apple-darwin.tar.gz
tar -xzvf panacus-0.3.0_aarch64-apple-darwin.tar.gz

# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn

# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f panacus-0.3.0_aarch64-apple-darwin/bin)":$PATH

# you are ready to go!
panacus --help
```

### From repository
```shell
git clone git@github.com:marschall-lab/panacus.git

cd panacus
cargo build --release
```

The compiled binary can be found in directory `target/release/` and is called `panacus`.
mkdir bin
ln -s ../target/release/panacus bin/
ln -s ../scripts/panacus-visualize.py bin/panacus-visualize

# install the Python libraries necessary for panacus-visualize
pip install --user matplotlib numpy pandas scikit-learn scipy seaborn

# suggestion: add tool to path in your ~/.bashrc
export PATH="$(readlink -f bin)":$PATH

# you are ready to go!
panacus --help

```

## Run

```console
$ ./target/release/panacus
$ panacus
Calculate count statistics for pangenomic data

Usage: panacus <COMMAND>

Commands:
histgrowth Run in default mode, i.e., run hist and growth successively and output the results of the latter
hist Calculate coverage histogram from GFA file
growth Construct growth table from coverage histogram
ordered-histgrowth Compute growth table for order specified in grouping file (or, if non specified, the order of paths in the GFA file)
table Compute coverage table for count items
info Return general graph and paths info
histgrowth Run hist and growth. Return the growth curve
hist Calculate coverage histogram
growth Calculate growth curve from coverage histogram
ordered-histgrowth Calculate growth curve based on group file order (if order is unspecified, use path order in GFA)
table Compute coverage table for count type
help Print this message or the help of the given subcommand(s)

Options:
-h, --help Print help
-V, --version Print version
```

## Pangenome Growth Statistics

Here's a quick example for computing pangenome growth statistics on the HPRC v.1.0 pggb, chr 22:

1. Download and unpack the graph:
```shell
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/chroms/chr22.hprc-v1.0-pggb.gfa.gz
gunzip chr22.hprc-v1.0-pggb.gfa.gz
```
2. Prepare file to group paths by sample:
```shell
grep '^P' chr22.hprc-v1.0-pggb.gfa | cut -f2 > chr22.hprc-v1.0-pggb.paths.txt
cut -f1 -d\# chr22.hprc-v1.0-pggb.paths.txt > chr22.hprc-v1.0-pggb.groupnames.txt
paste chr22.hprc-v1.0-pggb.paths.txt chr22.hprc-v1.0-pggb.groupnames.txt > chr22.hprc-v1.0-pggb.groups.txt
```
3. Prepare file to select subset of paths corresponding to haplotypes:
```shell
grep -ve 'grch38\|chm13' chr22.hprc-v1.0-pggb.paths.txt > chr22.hprc-v1.0-pggb.paths.haplotypes.txt
```
4. Run `panacus histgrowth` to calculate pangenome growth for nodes (default) with quorum tresholds 0, 1, 0.5, and 0.1 using up to 4 threads:
```shell
RUST_LOG=info ./target/release/panacus histgrowth -t4 -q 0,1,0.5,0.1 -g chr22.hprc-v1.0-pggb.groups.txt -s chr22.hprc-v1.0-pggb.paths.haplotypes.txt chr22.hprc-v1.0-pggb.gfa > chr22.hprc-v1.0-pggb.histgrowth.node.txt
```
5. Visualize growth curve and estimate growth parameters :
## Quickstart
Generate a simple growth plot from a GFA file:
```shell
./scripts/panacus-visualize.py -e chr22.hprc-v1.0-pggb.histgrowth.node.txt > chr22.hprc-v1.0-pggb.histgrowth.node.pdf
RUST_LOG=info panacus histgrowth -t6 -q 0.1,0.5,1 -S <INPUT_GFA> > output.tsv
panacus-visualize -e output.tsv > output.pdf
```

![ nodes in hprc-v1.0-pggb.gfa](docs/chr22.hprc-v1.0-pggb.histgrowth.node.png?raw=true "pangenome growth statistics on the HPRC v.1.0 pggb, chr 22")
## Examples
Examples can be found in the [examples directory](/examples/).

## Citation
Parmigiani, L., Garrison, E., Stoye, J., Marschall, T. & Doerr, D. Panacus: fast and exact pangenome growth and core size estimation. Bionformatics, https://doi.org/10.1093/bioinformatics/btae720 (2024).
54 changes: 54 additions & 0 deletions benches/panacus_benchmark.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use panacus::{analyses::InputRequirement, graph_broker::GraphBroker};
use std::collections::HashSet;

fn benchmark_graph_broker_hist(c: &mut Criterion) {
let gfa_file = "./benches/chrM.pan.fa.6626ff2.4030258.6a1ecc2.smooth.gfa";
let input_requirements = HashSet::from([
InputRequirement::Hist,
InputRequirement::Graph(gfa_file.to_string()),
InputRequirement::Node,
InputRequirement::Bp,
InputRequirement::Edge,
InputRequirement::PathLens,
]);
c.bench_function("graph_broker_hist", |b| {
b.iter(|| GraphBroker::from_gfa(black_box(&input_requirements)))
});
}

fn benchmark_graph_broker_hist_finish(c: &mut Criterion) {
let gfa_file = "./benches/chrM.pan.fa.6626ff2.4030258.6a1ecc2.smooth.gfa";
let input_requirements = HashSet::from([
InputRequirement::Hist,
InputRequirement::Graph(gfa_file.to_string()),
InputRequirement::Node,
InputRequirement::Bp,
InputRequirement::Edge,
InputRequirement::PathLens,
]);
let gb = GraphBroker::from_gfa(black_box(&input_requirements));
c.bench_function("graph_broker_hist_finish", |b| {
b.iter(|| black_box((&gb).clone()).finish())
});
}

fn benchmark_graph_broker_hist_node(c: &mut Criterion) {
let gfa_file = "./benches/chrM.pan.fa.6626ff2.4030258.6a1ecc2.smooth.gfa";
let input_requirements = HashSet::from([
InputRequirement::Hist,
InputRequirement::Graph(gfa_file.to_string()),
InputRequirement::Node,
]);
c.bench_function("graph_broker_hist_node", |b| {
b.iter(|| GraphBroker::from_gfa(black_box(&input_requirements)))
});
}

criterion_group!(
benches,
benchmark_graph_broker_hist,
benchmark_graph_broker_hist_finish,
benchmark_graph_broker_hist_node
);
criterion_main!(benches);
10 changes: 10 additions & 0 deletions build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
use std::process::Command;
fn main() {
// note: add error checking yourself.
let output = Command::new("git")
.args(&["describe", "--tags"])
.output()
.unwrap();
let git_hash = String::from_utf8(output.stdout).unwrap();
println!("cargo:rustc-env=GIT_HASH={}", git_hash);
}
282 changes: 282 additions & 0 deletions docs/chr22.hprc-v1.0-pggb.histgrowth.html

Large diffs are not rendered by default.

Binary file modified docs/chr22.hprc-v1.0-pggb.histgrowth.node.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/ecoli50.gfa.histgrowth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/panacus-illustration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions etc/bootstrap.bundle.min.js

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions etc/bootstrap.custom.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
$primary: #375a7f;
$secondary: #444;
$success: #00bc8c;
$info: #3498db;
$warning: #f39c12;
$danger: #e74c3c;
$light: #adb5bd;
$dark: #303030;

@import "./node_modules/bootstrap/scss/bootstrap";
6 changes: 6 additions & 0 deletions etc/bootstrap.min.css

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions etc/chart.js

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions etc/color-modes.min.js
121 changes: 121 additions & 0 deletions etc/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
.bi {
fill: currentColor;
}
.carousel-control-prev-icon {
background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23646668' viewBox='0 0 8 8'%3E%3Cpath d='M5.25 0l-4 4 4 4 1.5-1.5-2.5-2.5 2.5-2.5-1.5-1.5z'/%3E%3C/svg%3E") !important;
}

.carousel-control-next-icon {
background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23646668' viewBox='0 0 8 8'%3E%3Cpath d='M2.75 0l-1.5 1.5 2.5 2.5-2.5 2.5 1.5 1.5 4-4-4-4z'/%3E%3C/svg%3E") !important;
}

.icon-space {
padding-left: 0px;
}

.tree-icon {
fill: var(--bs-secondary-color);
/* stroke: var(--bs-secondary-color); */
}

.darkmode-icon {
fill: var(--bs-body-color);
stroke: var(--bs-body-color);
}

.tree-navbar {
background-color: var(--bs-secondary-bg);
}

[data-bs-theme="dark"] img {
-webkit-filter: invert(1);
filter: invert(1);
}

.version-text {
color: var(--bs-tertiary-color);
margin-top: 1.5em;
margin-bottom: -20px;
font-size: 90%;
}

ul.tree, ul.tree ul {
list-style: none;
margin: 0;
padding: 0;
}
ul.tree ul {
margin-left: 10px;
}
ul.tree li {
margin: 0;
padding: 0 7px;
line-height: 20px;
border-left:1px solid #ddd;

}
ul.tree li:last-child {
border-left:none;
}
ul.tree li:before {
position:relative;
top:-0.3em;
height:1.3em;
width:12px;
color:white;
border-bottom:1px solid #ddd;
content:"";
display:inline-block;
left:-7px;
}
ul.tree li:last-child:before {
border-left:1px solid #ddd;
}

.btn-toggle::after {
/* width: 1.25em; */
content: url("data:image/svg+xml,%3csvg xmlns='http://www.w3.org/2000/svg' width='24' height='24' viewBox='0 0 24 24'%3e%3cpath fill='rgba%28201,201,201,1%29' d='M19,13H13V19H11V13H5V11H11V5H13V11H19V13Z' /%3e%3c/svg%3e");
transition: transform .35s ease;
/* transform-origin: 50% 50%; */
display: inline-block;
float: right;
}

[data-bs-theme="dark"] .btn-toggle::after {
content: url("data:image/svg+xml,%3csvg xmlns='http://www.w3.org/2000/svg' width='24' height='24' viewBox='0 0 24 24'%3e%3cpath fill='rgba%28201,201,201,1%29' d='M19,13H13V19H11V13H5V11H11V5H13V11H19V13Z' /%3e%3c/svg%3e");
}

.btn-toggle[aria-expanded="true"] {
color: rgba(var(--bs-emphasis-color-rgb), .85);
}
.btn-toggle[aria-expanded="true"]::after {
transform: rotate(45deg);
}
.btn-toggle:hover,
.btn-toggle:focus {
color: rgba(var(--bs-emphasis-color-rgb), .85);
background-color: var(--bs-tertiary-bg);
}

.btn-nav {
display: flex;
align-items: center;
}
.btn-nav[aria-selected="true"] {
color: rgba(var(--bs-emphasis-color-rgb), .85);
}
.btn-nav:hover,
.btn-nav:focus {
color: rgba(var(--bs-emphasis-color-rgb), .85);
background-color: var(--bs-tertiary-bg);
}

.nav {
display: unset;
}
.nav-link {
display: unset;
font-size: unset;
font-weight: unset;
color: rgba(var(--bs-body-color));
}
154 changes: 154 additions & 0 deletions etc/hook_after.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
/* global bootstrap: false */
(() => {
'use strict'
const tooltipTriggerList = Array.from(document.querySelectorAll('[data-bs-toggle="tooltip"]'))
tooltipTriggerList.forEach(tooltipTriggerEl => {
new bootstrap.Tooltip(tooltipTriggerEl)
})
})()

const pluginCanvasBackgroundColor = {
id: 'customCanvasBackgroundColor',
beforeDraw: (chart, args, options) => {
const {ctx, chartArea: { top, bottom, left, right, width, height },
scales: {x, y}
} = chart;
ctx.save();
ctx.globalCompositeOperation = 'destination-over';
ctx.fillStyle = options.color || '#99ffff';
ctx.fillRect(left, top, width, height);
ctx.restore();
}
}

for (let key in objects.datasets) {
let element = objects.datasets[key];
if (element instanceof Bar) {
let h = element;
console.log('test ' + h.id);
var ctx = document.getElementById('chart-bar-' + h.id);
var myChart = new Chart(ctx, {
type: 'bar',
data: {
labels: h.labels,
datasets: [{
label: h.name,
data: h.values,
borderWidth: 1,
backgroundColor: PCOLORS[0],
borderColor: '#FFFFFF'
}]
},
options: {
scales: {
y: {
title: {
display: true,
text: h.y_label,
},
beginAtZero: true,
grid: {
color: '#FFFFFF',
}
},
x: {
title: {
display: true,
text: h.x_label,
},
grid: {
color: '#FFFFFF',
},
ticks: {
maxRotation: 90,
minRotation: 65
}
},
},
plugins: {
customCanvasBackgroundColor: {
color: '#E5E4EE',
}
}
},
plugins: [pluginCanvasBackgroundColor],
});
buildPlotDownload(myChart, h.id, fname);
if (h.log_toggle) {
buildLogToggle(myChart, "bar-" + h.id);
}
} else if (element instanceof MultiBar) {
let m = element;
console.log('multi-test ' + m.id);
var ctx = document.getElementById('chart-bar-' + m.id);
var myChart = new Chart(ctx, {
type: 'bar',
data: {
labels: m.labels,
datasets: Array.from(m.values.entries()).reverse().map(function([i, v]) {
return {
label: m.names[i],
data: v,
borderWidth: 1,
backgroundColor: PCOLORS[i % PCOLORS.length],
borderColor: '#FFFFFF'
};
}),
},
options: {
scales: {
y: {
title: {
display: true,
text: m.y_label,
},
beginAtZero: true,
grid: {
color: '#FFFFFF',
},
stacked: false,
},
x: {
title: {
display: true,
text: m.x_label,
},
grid: {
color: '#FFFFFF',
},
ticks: {
maxRotation: 90,
minRotation: 65
},
stacked: true,
},
},
plugins: {
customCanvasBackgroundColor: {
color: '#E5E4EE',
}
}
},
plugins: [pluginCanvasBackgroundColor],
});
buildPlotDownload(myChart, m.id, fname);
if (m.log_toggle) {
buildLogToggle(myChart, "bar-" + m.id);
}
}
}

console.log("tables");
for (let key in objects.tables) {
let table = objects.tables[key];
console.log(table);
buildTableDownload(table, key, key + '_' + fname);
}

// var tabs = document.querySelectorAll('button[data-bs-toggle="tab"]')
// tabs.forEach(function(tab) {
// tab.addEventListener('show.bs.tab', function (event) {
// document.querySelector(event.target.dataset.bsTarget).classList.remove('d-none');
// document.querySelector(event.relatedTarget.dataset.bsTarget).classList.add('d-none');
// });
// });
1 change: 1 addition & 0 deletions etc/hook_after.min.js
61 changes: 61 additions & 0 deletions etc/lib.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*!
* Panacus JS library
*/

const PCOLORS = ['#f77189', '#bb9832', '#50b131', '#36ada4', '#3ba3ec', '#e866f4'];

class Bar {
constructor(id, name, x_label, y_label, labels, values, log_toggle) {
this.id = id;
this.name = name;
this.x_label = x_label;
this.y_label = y_label;
this.labels = labels;
this.values = values;
this.log_toggle = log_toggle;
}
}

class MultiBar {
constructor(id, names, x_label, y_label, labels, values, log_toggle) {
this.id = id;
this.names = names;
this.x_label = x_label;
this.y_label = y_label;
this.labels = labels;
this.values = values;
this.log_toggle = log_toggle;
}
}

function buildPlotDownload(chart, obj, prefix) {
console.log('btn-download-plot-' + obj);
document.getElementById('btn-download-plot-' + obj).onclick = function() {
var a = document.createElement('a');
a.href = chart.toBase64Image();
a.download = prefix + '_' + obj + '.png';
a.click();
};
}

function buildTableDownload(table, id, prefix) {
console.log('btn-download-table-' + id);
document.getElementById('btn-download-table-' + id).onclick = function() {
let blob = new Blob([table], {type: 'text/plain'});
var a = document.createElement('a');
a.href = URL.createObjectURL(blob);
a.download = prefix + '_table.tsv';
a.click();
};
}

function buildLogToggle(chart, name) {
document.getElementById('btn-logscale-plot-' + name).addEventListener('change', function(event) {
if (event.currentTarget.checked) {
chart.options.scales.y.type = 'logarithmic';
} else {
chart.options.scales.y.type = 'linear';
}
chart.update();
});
}
1 change: 1 addition & 0 deletions etc/lib.min.js
12 changes: 12 additions & 0 deletions etc/make_custom_bootstrap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

npm install bootstrap
npm install -g sass
npm install -g css-minify
npm install -g uglify-js

sass bootstrap.custom.scss bootstrap.custom.css

css-minify -f bootstrap.custom.css
uglifyjs lib.js > lib.min.js
uglifyjs hook_after.js > hook_after.min.js
Binary file added etc/panacus-illustration-small.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 58 additions & 0 deletions etc/report_template.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<!DOCTYPE html>
<html lang="en" data-bs-theme="auto">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta charset="UTF-8">
<script>
{{{bootstrap_js}}}
{{{bootstrap_color_modes_js}}}
{{{chart_js}}}

{{{custom_lib_js}}}
</script>
<style>
{{{bootstrap_css}}}
{{{custom_css}}}
</style>
<title>panacus: {{fname}}</title>
</head>
<body>
{{{symbols_svg}}}
<div class="dropdown position-fixed bottom-0 end-0 mb-3 me-3 bd-mode-toggle">
<button class="btn btn-bd-primary py-2 dropdown-toggle d-flex align-items-center" id="bd-theme" type="button" aria-expanded="false" data-bs-toggle="dropdown" aria-label="Toggle theme (auto)"><svg class="bi my-1 theme-icon-active" width="15" height="15">
<use href="#circle-half"></use></svg> <span class="visually-hidden" id="bd-theme-text">Toggle theme</span></button>
<ul class="dropdown-menu dropdown-menu-end shadow" aria-labelledby="bd-theme-text">
<li><button type="button" class="dropdown-item d-flex align-items-center" data-bs-theme-value="light" aria-pressed="false"><svg class="bi me-2 opacity-50 theme-icon" width="15" height="15">
<use href="#sun-fill"></use></svg> Light <svg class="bi ms-auto d-none" width="15" height="15">
<use href="#check2"></use></svg></button></li>
<li><button type="button" class="dropdown-item d-flex align-items-center" data-bs-theme-value="dark" aria-pressed="false"><svg class="bi me-2 opacity-50 theme-icon" width="15" height="15">
<use href="#moon-stars-fill"></use></svg> Dark <svg class="bi ms-auto d-none" width="15" height="15">
<use href="#check2"></use></svg></button></li>
<li><button type="button" class="dropdown-item d-flex align-items-center active" data-bs-theme-value="auto" aria-pressed="true"><svg class="bi me-2 opacity-50 theme-icon" width="15" height="15">
<use href="#circle-half"></use></svg> Auto <svg class="bi ms-auto d-none" width="15" height="15">
<use href="#check2"></use></svg></button></li>
</ul>
</div>
<div class="position-fixed bottom-0 start-0 opacity-50 p-2">
<em>created on {{timestamp}} with panacus v{{version}}</em>
</div>
<main class="d-block">
<div class="d-flex justify-content-between p-3">
<img style='display:block; width:10vw;' id='base64image' alt="panacus logo" src='data:image/jpeg;base64,{{panacus_logo}}'>
<div class="p-2">
<h1>Report for <em>{{fname}}</em></h1>
</div>
<div class="opacity-50 p-0">
<!--here goes nothing //-->
</div>
</div>
<div class="container-fluid p-3">
{{{content}}}
</div>
</main>
<script>
{{{data_hook}}}
{{{hook_after_js}}}
</script>
</body>
</html>
41 changes: 41 additions & 0 deletions etc/symbols.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions examples/html_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Generate an HTML report for your pangenome graph!

*TIP:*
You can try this example by downloading this file and running:
````bash
cat html_report.md | sed -n '/```shell/,/```/p' | sed '/```/d' | bash
````

Instead of tab-separated tables, `panacus` supports for many commands also HTML output. The generated report page is interactive and self-contained.

1. Download and unpack the graph:
```shell
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/chroms/chr22.hprc-v1.0-pggb.gfa.gz
gunzip chr22.hprc-v1.0-pggb.gfa.gz
```
2. Prepare file to select subset of paths corresponding to haplotypes:
```shell
grep '^P' chr22.hprc-v1.0-pggb.gfa | cut -f2 | grep -ve 'grch38\|chm13' > chr22.hprc-v1.0-pggb.paths.haplotypes.txt
```

3. Run `panacus histgrowth` with settings to output stats for all graph features (`-c all`), include coverage histogram in output (`-a`), and set
output to HTML (`-o html`):
```shell
RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -S -s chr22.hprc-v1.0-pggb.paths.haplotypes.txt -c all -a -o html chr22.hprc-v1.0-pggb.gfa > chr22.hprc-v1.0-pggb.histgrowth.html
```

:point_right: :point_right: :point_right: **view the resulting [HTML report here](https://htmlpreview.github.io/?https://github.com/marschall-lab/panacus/blob/main/docs/chr22.hprc-v1.0-pggb.histgrowth.html)!**

![panacus report (coverage histogram) for chr22.hprc-v1.0-pggb.gfa](/docs/chr22.hprc-v1.0-pggb.report.histogram.logscale.highlight.png?raw=true "pangenome report of chr22.hprc-v1.0-pggb.gfa showing coverage histogram in logsacle")

### Figure legend
1. Navigate between coverage histograms for bp, node, and edge through tabs
2. Toggle log-scale on Y-axis
3. Download plot as PNG file
4. Download raw data as tab-separated-values (TSV) file
5. Choose between light and dark theme
6. Proceed to pangenome growth plots

![panacus report (pangenome growth) for chr22.hprc-v1.0-pggb.gfa](/docs/chr22.hprc-v1.0-pggb.report.growth.disabled.highlight.png?raw=true "pangenome report of chr22.hprc-v1.0-pggb.gfa showing pangenome growth plots with disabled curves")

### Figure legend
1. Navigate between coverage histograms for bp, node, and edge through tabs
2. Disable curves that you do not want to view by clicking on legend
3. Download plot as PNG file
4. Download raw data as tab-separated-values (TSV) file
5. Choose between light and dark theme
42 changes: 42 additions & 0 deletions examples/ordered_pangenome_growth_statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Ordered Pangenome Growth Statistics

*TIP:*
You can try this example by downloading this file and running:
````bash
cat ordered_pangenome_growth_statistics.md | sed -n '/```shell/,/```/p' | sed '/```/d' | bash
````

Sometimes it is interesting to look at the pangenome growth when samples are processed in a specific order rather than considering all all possible
orders. `panacus`' capability to construct such plots is illustrated here by the example of the GRCh38-based HPRC v.1.0 minigraph-cactus graph (all
chromosomes). The example reproduces Figure 3g(left) from the publication [A draft human pangenome
reference](https://doi.org/10.1038/s41586-023-05896-x) that quantifies pangenome growth of the amount of non-reference (GRCh38) sequence of the
minigraph-cactus based human pangenome reference graph.

1. Download and unpack the graph:
```shell
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-grch38.gfa.gz
gunzip hprc-v1.0-mc-grch38.gfa.gz
```
2. Establish order of samples in the growth statistics:
```shell
echo 'HG03492 HG00438 HG00621 HG00673 HG02080 HG00733 HG00735 HG00741 HG01071 HG01106 HG01109 HG01123 HG01175 HG01243 HG01258 HG01358 HG01361 HG01928
HG01952 HG01978 HG02148 HG01891 HG02055 HG02109 HG02145 HG02257 HG02486 HG02559 HG02572 HG02622 HG02630 HG02717 HG02723 HG02818 HG02886 HG03098
HG03453 HG03486 HG03516 HG03540 HG03579 NA18906 NA20129 NA21309' | tr ' ' '\n' > hprc-v1.0-mc-grch38.order.samples.txt
```
3. Exclude paths from reference genome GRCh38
```shell
grep '^P' hprc-v1.0-mc-grch38.gfa | cut -f2 | grep -ie 'grch38' > hprc-v1.0-mc-grch38.exclude.grch38.txt
```
4. Run `panacus ordered-histgrowth` to calculate pangenome growth for basepairs with coverage thresholds 1,2,3, and 42 using up to 4 threads:
```shell
RUST_LOG=info panacus ordered-histgrowth -c bp -t4 -l 1,2,3,42 -S -e hprc-v1.0-mc-grch38.exclude.grch38.txt -O hprc-v1.0-mc.grch38.order.samples.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.ordered-histgrowth.bp.tsv
```
(The log will report some errors regarding missing order information of CHM13 paths. These paths will be ignored in the plot, which is the intended
behavior of this command line call)

5. Visualize growth curve and estimate growth parameters :
```shell
panacus-visualize hprc-v1.0-mc-grch38.ordered-histgrowth.bp.tsv > hprc-v1.0-mc-grch38.ordered-histgrowth.bp.pdf
```

![ordered pangenome growth of bps in hprc-v1.0-mc-grch38.gfa](/docs/hprc-v1.0-mc-grch38.ordered-histgrowth.bp.png?raw=true "pangenome growth of non-reference sequence in the HPRC v.1.0 MC GRCh38 graph")
25 changes: 25 additions & 0 deletions examples/pangenome_growth_ecoli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Pangenome Coverage and Growth Statistics for PGGB

*TIP:*
You can try this example by downloading this file and running:
````bash
cat pangenome_growth_ecoli.md | sed -n '/```shell/,/```/p' | sed '/```/d' | bash
````

1. Download and unpack the graph:
```shell
wget -c https://zenodo.org/record/7937947/files/ecoli50.gfa.zst
zstd -d ecoli50.gfa.zst
```

2. Run `panacus histgrowth` to calculate coverage and pangenome growth for basepairs with quorum thresholds 0, 1, 0.5, and 0.1 using up to 4 threads:
```shell
RUST_LOG=info panacus histgrowth ecoli50.gfa -c bp -q 0,1,0.5,0.1 -t 4 > ecoli50.gfa.histgrowth.tsv
```

3. Visualize coverage histogram and pangenome growth curve with estimation of growth parameters. Place the legend in the upper left:
```shell
panacus-visualize -e -l "upper left" ecoli50.gfa.histgrowth.tsv > ecoli50.gfa.histgrowth.tsv.pdf
```

![coverage histogram and pangenome growth of bps in ecoli50.gfa](/docs/ecoli50.gfa.histgrowth.png?raw=true "coverage and pangenome growth statistics on the Ecoli50 graph")
33 changes: 33 additions & 0 deletions examples/pangenome_growth_minigraph_cactus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Pangenome Coverage and Growth Statistics for Minigraph-Cactus

*TIP:*
You can try this example by downloading this file and running:
````bash
cat pangenome_growth_minigraph_cactus.md | sed -n '/```shell/,/```/p' | sed '/```/d' | bash
````

This example shows how to ccompute coverage and pangenome growth statistics for the HPRC v.1.1 mc, chr 22:

1. Download the graph:
```shell
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.chroms/chr22.vg
```
2. Convert to GFA (this graph is provided in VG format and requires conversion into GFA with [vg](https://github.com/vgteam/vg):
```shell
vg view --gfa chr22.vg > chr22.hprc-v1.1-mc-grch38.gfa
```
3. Prepare file to select subset of paths corresponding to haplotypes:
```shell
grep -e '^W' chr22.hprc-v1.1-mc-grch38.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' > chr22.hprc-v1.1-mc-grch38.paths.txt
grep -ve 'grch38\|chm13' chr22.hprc-v1.1-mc-grch38.paths.txt > chr22.hprc-v1.1-mc-grch38.paths.haplotypes.txt
```
4. Run `panacus histgrowth` to calculate coverage and pangenome growth for nodes (default) with coverage/quorum thresholds 1/0, 2/0, 1/1, 1/0.5, and 1/0.1 using up to 4 threads:
```shell
RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -S -a -s chr22.hprc-v1.1-mc-grch38.paths.haplotypes.txt chr22.hprc-v1.1-mc-grch38.gfa > chr22.hprc-v1.1-mc-grch38.histgrowth.node.tsv
```
5. Visualize coverage histogram and pangenome growth curve with estimated growth parameters:
```shell
panacus-visualize -e chr22.hprc-v1.1-mc-grch38.histgrowth.node.tsv > chr22.hprc-v1.1-mc-grch38.histgrowth.node.pdf
```

![coverage histogram and pangenome growth of nodes in chr22.hprc-v1.1-mc-grch38.gfa](/docs/chr22.hprc-v1.1-mc-grch38.histgrowth.node.png?raw=true "coverage and pangenome growth statistics on the HPRC v.1.1 mc-grch38, chr 22")
29 changes: 29 additions & 0 deletions examples/pangenome_growth_pggb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Pangenome Coverage and Growth Statistics for PGGB

*TIP:*
You can try this example by downloading this file and running:
````bash
cat pangenome_growth_pggb.md | sed -n '/```shell/,/```/p' | sed '/```/d' | bash
````

Here's a quick example for computing coverage and pangenome growth statistics on the HPRC v.1.0 pggb, chr 22:

1. Download and unpack the graph:
```shell
wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/chroms/chr22.hprc-v1.0-pggb.gfa.gz
gunzip chr22.hprc-v1.0-pggb.gfa.gz
```
2. Prepare file to select subset of paths corresponding to haplotypes:
```shell
grep '^P' chr22.hprc-v1.0-pggb.gfa | cut -f2 | grep -ve 'grch38\|chm13' > chr22.hprc-v1.0-pggb.paths.haplotypes.txt
```
3. Run `panacus histgrowth` to calculate coverage and pangenome growth for nodes (default) with coverage/quorum thresholds 1/0, 2/0, 1/1, 1/0.5, and 1/0.1 using up to 4 threads:
```shell
RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -S -a -s chr22.hprc-v1.0-pggb.paths.haplotypes.txt chr22.hprc-v1.0-pggb.gfa > chr22.hprc-v1.0-pggb.histgrowth.node.tsv
```
4. Visualize coverage histogram and pangenome growth curve:
```shell
panacus-visualize chr22.hprc-v1.0-pggb.histgrowth.node.tsv > chr22.hprc-v1.0-pggb.histgrowth.node.pdf
```

![coverage histogram and pangenome growth of nodes in chr22.hprc-v1.0-pggb.gfa](/docs/chr22.hprc-v1.0-pggb.histgrowth.node.png?raw=true "coverage and pangenome growth statistics on the HPRC v.1.0 pggb, chr 22")
3 changes: 3 additions & 0 deletions hbs/analysis_section.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{{#each tab_content}}
{{{this}}}
{{/each}}
28 changes: 28 additions & 0 deletions hbs/analysis_tab.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<div class="tab-pane fade" id="nav-{{id}}" role="tabpanel" aria-labelledby="nav-{{id}}">
<div class="d-flex justify-content-between align-items-end">
<div>
<p class="h5">{{analysis}}</p>
<p class="h2"><b>{{run_name}}</b></p>
<p>{{countable}}</p>
</div>
<div class="btn-group" role="group" aria-label="Basic example">
{{#if has_table}}
<button type="button" class="btn btn-outline-secondary" id="btn-download-table-{{id}}">Download table</button>
{{/if}}
{{#if has_graph}}
<button type="button" class="btn btn-outline-secondary dropdown-toggle" data-bs-toggle="dropdown" aria-expanded="false">Download plot</button>
<ul class="dropdown-menu">
<li><a class="dropdown-item" href="#">As svg</a></li>
<li><a class="dropdown-item" href="#" id="btn-download-plot-{{id}}">As png</a></li>
</ul>
{{/if}}
</div>
</div>
<div>
<br/>
{{#each items}}
{{{this}}}
<br/>
{{/each}}
</div>
</div>
10 changes: 10 additions & 0 deletions hbs/bar.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{{#if log_toggle}}
<div class="d-flex flex-row-reverse">
<div class="form-check form-switch">
<input class="form-check-input" type="checkbox" role="switch" id="btn-logscale-plot-bar-{{id}}">
<label class="form-check-label" for="btn-logscale-plot-bar-{{id}}">log-scale</label>
</div>
</div>
{{/if}}
<canvas id="chart-bar-{{id}}"></canvas>
<br/>
72 changes: 72 additions & 0 deletions hbs/report.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<!DOCTYPE html>
<html lang="en" data-bs-theme="auto">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta charset="UTF-8">
<script>
{{{bootstrap_js}}}
{{{bootstrap_color_modes_js}}}
{{{chart_js}}}
{{{custom_lib_js}}}
</script>
<style>
{{{bootstrap_css}}}
{{{custom_css}}}
</style>
<title>panacus: {{fname}}</title>
</head>
<body>
{{{symbols_svg}}}
<main class="d-flex flex-nowrap">
{{{tree}}}

{{{content}}}

<div class="dropdown position-fixed bottom-0 end-0 mb-3 me-3 bd-mode-toggle">
<button class="btn btn-bd-primary py-2 dropdown-toggle d-flex align-items-center"
id="bd-theme"
type="button"
aria-expanded="false"
data-bs-toggle="dropdown"
aria-label="Toggle theme (auto)">
<svg class="bi my-1 theme-icon-active tree-icon" width="1em" height="1em"><use href="#circle-half"></use></svg>
<span class="visually-hidden" id="bd-theme-text">Toggle theme</span>
</button>
<ul class="dropdown-menu dropdown-menu-end shadow" aria-labelledby="bd-theme-text">
<li>
<button type="button" class="dropdown-item d-flex align-items-center" data-bs-theme-value="light" aria-pressed="false">
<svg class="bi me-2 opacity-50 darkmode-icon" width="1em" height="1em"><use href="#sun-fill"></use></svg>
Light
<svg class="bi ms-auto d-none darkmode-icon" width="1em" height="1em"><use href="#check2"></use></svg>
</button>
</li>
<li>
<button type="button" class="dropdown-item d-flex align-items-center" data-bs-theme-value="dark" aria-pressed="false">
<svg class="bi me-2 opacity-50 darkmode-icon" width="1em" height="1em"><use href="#moon-stars-fill"></use></svg>
Dark
<svg class="bi ms-auto d-none darkmode-icon" width="1em" height="1em"><use href="#check2"></use></svg>
</button>
</li>
<li>
<button type="button" class="dropdown-item d-flex align-items-center active" data-bs-theme-value="auto" aria-pressed="true">
<svg class="bi me-2 opacity-50 darkmode-icon" width="1em" height="1em"><use href="#circle-half"></use></svg>
Auto
<svg class="bi ms-auto d-none darkmode-icon" width="1em" height="1em"><use href="#check2"></use></svg>
</button>
</li>
</ul>
</div>
</main>

<script>
var path = window.location.pathname;
var page = "panacus: " + path.split("/").pop();
document.title = page;
const fname = '{{{fname}}}';
const objects = {{{data_hook}}};
console.log(objects);
{{{hook_after_js}}}
</script>
</body>
</html>
5 changes: 5 additions & 0 deletions hbs/report_content.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<div class="container p-5 tab-content">
{{#each this}}
{{{this}}}
{{/each}}
</div>
18 changes: 18 additions & 0 deletions hbs/table.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<table class="table table-striped table-hover">
<thead>
<tr>
{{#each header}}
<th scope="col">{{this}}</th>
{{/each}}
</tr>
</thead>
<tbody class="table-group-divider">
{{#each values}}
<tr>
{{#each this}}
<td>{{this}}</td>
{{/each}}
</tr>
{{/each}}
</tbody>
</table>
46 changes: 46 additions & 0 deletions hbs/tree.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<div class="flex-shrink-0 d-flex flex-column vh-100 p-3 tree-navbar" style="width: 400px;">
<a href="/" class="d-flex align-items-center pb-3 mb-3 link-body-emphasis text-decoration-none border-bottom">
<svg class="bi pe-none me-2" width="30" height="24"><use xlink:href="#bootstrap"/></svg>
<span class="fs-5 fw-semibold">Panacus</span>
</a>
<ul class="tree mb-auto nav">
{{#each analyses}}
<li>
<button class="btn icon-space btn-toggle" style="width: 95%" data-bs-toggle="collapse" data-bs-target="#{{this.id}}-analysis-collapse" aria-expanded="false" aria-controls="{{this.id}}-analysis-collapse">
<div style="float: left">
<svg class="bi pe-none me-2 tree-icon" width="30" height="24"><use xlink:href="#{{this.icon}}-icon"/></svg>
{{this.title}}
</div>
</button>
<div id="{{this.id}}-analysis-collapse" class="collapse">
<ul>
{{#each this.runs}}
<li>
<button class="btn icon-space btn-toggle" style="width: 95%" data-bs-toggle="collapse" data-bs-target="#{{this.id}}-run-collapse" aria-expanded="false" aria-controls="{{this.id}}-run-collapse">
<div style="float: left; padding-left: 7px">
{{this.title}}
</div>
</button>
<div id="{{this.id}}-run-collapse" class="collapse">
<ul>
{{#each this.countables}}
<li class="nav-item">
<button class="btn nav-link btn-nav" data-bs-toggle="tab" data-bs-target="#nav-{{this.href}}" type="button" id="{{this.id}}">{{this.title}}</button>
</li>
{{/each}}
</ul>
</div>
</li>
{{/each}}
</ul>
</div>
</li>
{{/each}}
</ul>
<ul class="list-unstyled ps-0">
<li class="border-top my-3"></li>
<li><a href="https://github.com/marschall-lab/panacus" class="btn">GitHub</a></li>
<li><a href="#" class="btn">Documentation</a></li>
<li class="version-text">Created on {{timestamp}} using<br> panacus {{version}}</li>
</ul>
</div>
55 changes: 55 additions & 0 deletions package.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash

# Best way to use this script is to run it in the form:
# -------------------------------------------------------
# PANACUS_TARGET="x86_64-unknown-linux-musl" ./package.sh
# -------------------------------------------------------
# Output is in the pkg directory

NAME="panacus"
EXEC="panacus"
VERSION="$(cargo read-manifest | jq .version | sed "s/\"//g")"
ARCH="${PANACUS_TARGET}"
# ARCH="x86_64-unknown-linux-musl"
# ARCH="x86_64-apple-darwin"
# ARCH="aarch64-apple-darwin"

echo "Packaging ${NAME}, ${EXEC}, ${VERSION}, ${ARCH}"
cargo fmt && \
cargo check && \
cargo test || exit 1

echo "Building release"
cargo build --release --target ${ARCH} || exit 1
echo "Finished building release"


if [ ! -d ./pkg ]; \
then \
mkdir ./pkg; \
fi

if [ -d ./pkg/${NAME}-${VERSION}_${ARCH} ]; \
then \
echo "Current version number already exists! Removing old files!"; \
rm -rf ./pkg/${NAME}-${VERSION}_${ARCH}; \
fi

mkdir ./pkg/${NAME}-${VERSION}_${ARCH}

cp -r ./scripts ./pkg/${NAME}-${VERSION}_${ARCH}/
cp -r ./docs ./pkg/${NAME}-${VERSION}_${ARCH}/

mkdir ./pkg/${NAME}-${VERSION}_${ARCH}/bin
cp target/${ARCH}/release/${EXEC} ./pkg/${NAME}-${VERSION}_${ARCH}/bin/
strip ./pkg/${NAME}-${VERSION}_${ARCH}/bin/${EXEC}
ln -s ../scripts/$NAME-visualize.py ./pkg/${NAME}-${VERSION}_${ARCH}/$NAME-visualize

cp LICENSE ./pkg/${NAME}-${VERSION}_${ARCH}/
bash ./set-version.sh
cp README.md ./pkg/${NAME}-${VERSION}_${ARCH}/
cp -r examples ./pkg/${NAME}-${VERSION}_${ARCH}/

cd ./pkg && tar -czf ./${NAME}-${VERSION}_${ARCH}.tar.gz ./${NAME}-${VERSION}_${ARCH}
echo "Cleaning up ./pkg/${NAME}-${VERSION}_${ARCH}"
rm -rf ./${NAME}-${VERSION}_${ARCH}
250 changes: 158 additions & 92 deletions scripts/panacus-visualize.py
Original file line number Diff line number Diff line change
@@ -13,6 +13,7 @@
# third party packages
#

from matplotlib.transforms import Bbox
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
@@ -21,11 +22,43 @@
from scipy.optimize import curve_fit
import seaborn as sns

PAT_PANACUS = re.compile('^#.+panacus (\S+) (.+)')
PAT_PANACUS = re.compile(r'^#.+panacus (\S+) (.+)')
N_HEADERS = 4
SUPPORTED_FILE_FORMATS = plt.gcf().canvas.get_supported_filetypes().keys()

ids = pd.IndexSlice

def clean_multicolumn_labels(df):
'''
Replaces 'Unnamed: ...' column headers from hierarchical columns by empty
strings.
Parameters
----------
df : DataFrame
A table
Returns
-------
DataFrame
Same table (i.e., same object) as input table, but with 'Unnamed: ..'
column headers replaced by empty strings.
'''

column_header = list()
for c in df.columns:
if isinstance(c, tuple):
c = tuple((not x.startswith('Unnamed:') and x or '' for x in c))
elif isinstance(c, str) and c.startswith('Unnamed:'):
c = ''
column_header.append(c)
df.columns = pd.Index(column_header, tupleize_cols=True)
return df


def humanize_number(i, precision=0):

#assert i >= 0, f'non-negative number assumed, but received "{i}"'
#assert i >= 0, f'non-negative number assumed, but received '{i}''

order = 0
x = i
@@ -59,142 +92,175 @@ def fit_alpha(Y):
return fit(X, Y, lambda x, *y: y[0]*x**(-y[1]))


def plot_hist(df, fname, counttype, out, loc='lower left', figsize=(10, 6)):

# setup fancy plot look
sns.set_theme(style='darkgrid')
sns.set_color_codes('colorblind')
sns.set_palette('husl')
sns.despine(left=True, bottom=True)

df.plot.bar(figsize=figsize)
plt.xticks(rotation=65)
def plot_hist(df, ax, loc='lower left'):

yticks, _ = plt.yticks()
plt.yticks(yticks, calibrate_yticks_text(yticks))
df.plot.bar(ax=ax, label=df.columns[0][1])

plt.title(f'coverage histogram for #{counttype}s ({fname})')
plt.legend(loc=loc)
ax.set_xticks(ax.get_xticks())
ax.set_xticklabels(ax.get_xticks(), rotation=65)
yticks = ax.get_yticks()
ax.set_yticks(yticks)
ax.set_yticklabels(calibrate_yticks_text(yticks))

plt.tight_layout()
plt.savefig(out, format='pdf')
plt.close()
ax.set_title(f'coverage histogram for #{df.columns[0][1]}s')
ax.set_ylabel(f'#{df.columns[0][1]}s')
#ax.legend(loc=loc)
ax.get_legend().remove()

def plot_growth(df, fname, counttype, out, loc='lower left', estimate_growth=False, figsize=(10, 6)):

# setup fancy plot look
sns.set_theme(style='darkgrid')
sns.set_color_codes('colorblind')
sns.set_palette('husl')
sns.despine(left=True, bottom=True)
def plot_growth(df, axs, loc='lower left', estimate_growth=False):

# let's do it!
if estimate_growth and (df.columns.levels[1] > 1/df.shape[0]).any():
f, axs = plt.subplots(2,1, figsize=(figsize[0], 2*figsize[1]))
else:
f, ax = plt.subplots(1,1, figsize=figsize)
axs = [ax]

popts = list()
for i, (c,q) in enumerate(df.columns):
df[(c, q)].plot.bar(color=f'C{i}', label=f'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%', ax=axs[0])
if estimate_growth and q < 1/df.shape[0]:
popt, curve = fit_gamma(df[(c,q)].array)
popts.append((c, q, popt, i))
axs[0].plot(curve, '--', color='black', label=f'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%, $k_1 X^γ$ with $k_1$={humanize_number(popt[0],1)}, γ={popt[1]:.3f})')
df = df.reindex(sorted(df.columns, key=lambda x: (x[3], x[2])), axis=1)
for i, (t, ct, c, q) in enumerate(df.columns):
df[(t, ct, c, q)].plot.bar(color=f'C{i}', label=fr'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%', ax=axs[0])
if c <= 1 and q <= 1/df.shape[0]:
if estimate_growth:
popt, curve = fit_gamma(df.loc[1:, (t, ct, c,q)].array)
popts.append((c, q, popt, i))
axs[0].plot(df.loc[1:].index, curve, '--', color='black', label=fr'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%, $k_1 X^γ$ with $k_1$={humanize_number(popt[0],1)}, γ={popt[1]:.3f})')
else:
popts.append((c, q, None, i))
axs[0].set_xticklabels(axs[0].get_xticklabels(), rotation=65)

yticks = axs[0].get_yticks()
axs[0].set_yticks(yticks)
axs[0].set_yticklabels(calibrate_yticks_text(yticks))

axs[0].set_title(f'Pangenome growth ({fname})')
axs[0].set_ylabel(f'#{counttype}s')
axs[0].set_title(f'{df.columns[0][0]} plot for #{df.columns[0][1]}s')
axs[0].set_ylabel(f'#{df.columns[0][1]}s')
axs[0].set_xlabel('samples')
axs[0].legend(loc=loc)

if popts:
for c, q, _, i in popts:
x = np.zeros(df.shape[0])
x[1:] = df.loc[:df.shape[0]-1, (c, q)]
(df[(c, q)] - x).plot.bar(color=f'C{i}', label=f'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%', ax=axs[1])
popt, _ = fit_alpha((df.loc[2:, (c, q)] - x[1:]).array)
k2 = popt[0]
alpha = popt[1]
Y = k2*df.index.array**(-alpha)
axs[1].plot(Y.to_numpy(), '--', color='black', label=f'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%, $k_2 X^{{-α}}$ with $k_2$={humanize_number(k2,1)}, α={alpha:.3f})')

axs[1].set_xticklabels(axs[0].get_xticklabels(), rotation=65)
x = np.zeros(df.shape[0]-1)
x[1:] = df.loc[df.index[1]:df.index[-2], (t, ct, c, q)]
(df.loc[df.index[1]:, (t, ct, c, q)] - x).plot.bar(color=f'C{i}', label=fr'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%', ax=axs[1])
if estimate_growth:
popt, _ = fit_alpha((df.loc[df.index[2]:, (t, ct, c, q)] - x[1:]).array)
k2 = popt[0]
alpha = popt[1]
Y = k2*np.arange(1, df.shape[0]+1)**(-alpha)
axs[1].plot(Y, '--', color='black', label=fr'coverage $\geq {c}$, quorum $\geq {q*100:.0f}$%, $k_2 X^{{-α}}$ with $k_2$={humanize_number(k2,1)}, α={alpha:.3f})')

axs[1].set_xticklabels(axs[1].get_xticklabels(), rotation=65)

yticks = axs[1].get_yticks()
axs[1].set_yticks(yticks)
axs[1].set_yticklabels(calibrate_yticks_text(yticks))

axs[1].set_title('$F_{new}$')
axs[1].set_ylabel(f'#{counttype}s')
axs[1].set_title(f'$F_{{new}}$ (#{df.columns[0][1]}s)')
axs[1].set_ylabel(f'#{df.columns[0][1]}s')
axs[1].set_xlabel('samples')
axs[1].legend(loc=loc)

plt.tight_layout()
plt.savefig(out, format='pdf')
plt.close()


def get_panacus_command_counttype(data):
def get_subplot_dim(df):

header = next(data)
m = PAT_PANACUS.match(header)
growths = [x for x in df.columns.levels[0] if x.endswith('growth')]
non_cum = 0
if growths:
non_cum = df.loc[:, ids[growths, :, :, :]].columns.map(lambda c: c[2] <= 1 and c[3] <= 1/df.shape[0]).any() and len(growths) or 0

if not m:
print(f'Input file "{data.name}" has wrong header. It doesn\'t seem to be generated by panacus, exiting.', file=stderr)
exit(1)
return len(df.columns.levels[1]), len(df.columns.levels[0]) + non_cum, non_cum

command, arg_list = m.groups()
arg_list = arg_list.split(' ')
counttype = 'node'
if '-c' in arg_list:
counttype = arg_list[arg_list.index('-c')+1]
elif '--count' in arg_list:
counttype = arg_list[arg_list.index('--count')+1]

return command, counttype
def full_extent(ax, pad=0.0):
'''
Gets the full extent of a given axes including labels, axis and
titles.
'''
ax.figure.canvas.draw()
items = ax.get_xticklabels() + ax.get_yticklabels()
items += [ax, ax.title, ax.xaxis.label, ax.yaxis.label]
items += [ax, ax.title]
bbox = Bbox.union([item.get_window_extent() for item in items])
return bbox.expanded(1.0 + pad, 1.0 + pad)

def save_split_figures(ax, f, format, prefix):
for i, ax_row in enumerate(axs):
for j, ax in enumerate(ax_row):
extent = full_extent(ax).transformed(
f.dpi_scale_trans.inverted())
with open(f'{prefix}{i}_{j}.{format}', 'wb+') as out:
plt.savefig(out, bbox_inches=extent, format=format)


if __name__ == '__main__':
description='''
Visualize growth stats. PDF file will be plotted to stdout.
Visualize growth stats. Figures in given (output) format will be plotted to stdout, or optionally splitted into in individual files that start
with a given prefix.
'''
parser = ArgumentParser(formatter_class=ADHF, description=description)
parser.add_argument('stats', type=open,
help='Growth/Histogram table computed by panacus')
parser.add_argument('-e', '--estimate_growth_params', action='store_true',
help='Estimate growth parameters based on least-squares fit')
parser.add_argument('-l', '--legend_location',
choices = ['lower left', 'lower right', 'upper left', 'upper right'],
parser.add_argument('-l', '--legend_location',
choices = ['lower left', 'lower right', 'upper left', 'upper right'],
default = 'upper left',
help='Estimate growth parameters based on least-squares fit')
parser.add_argument('-s', '--figsize', nargs=2, type=int, default=[10, 6],
help='Set size of figure canvas')
parser.add_argument('-f', '--format', default='pdf' in SUPPORTED_FILE_FORMATS and 'pdf' or SUPPORTED_FILE_FORMATS[0], choices=SUPPORTED_FILE_FORMATS,
help='Specify the format of the output')
parser.add_argument('--split_subfigures', action='store_true',
help='Split output into multiple files')
parser.add_argument('--split_prefix', default='out_',
help='Prefix given to the files generated when splitting into subfigures')

args = parser.parse_args()

with open(args.stats.name) as growth:
command, counttype = get_panacus_command_counttype(growth)
df = clean_multicolumn_labels(pd.read_csv(args.stats, sep='\t', header=list(range(N_HEADERS)), index_col=[0], comment='#'))
if df.columns[0][0] not in ['hist', 'growth', 'ordered-growth']:
print('This script cannot visualize the content of this type of table, exiting.', file=stderr)
exit(1)
df.columns = df.columns.map(lambda x: (x[0], x[1], x[2] and int(x[2].replace("A", "")), x[3] and float(x[3].replace("R", ""))))

if command in ['ordered-histgrowth', 'histgrowth', 'growth']:
df = pd.read_csv(args.stats, sep='\t', header=[1,2], index_col=[0])
df.columns = df.columns.map(lambda x: (int(x[0]), float(x[1])))
df = df.reindex(sorted(df.columns, key=lambda x: (x[1], x[0])), axis=1)
with fdopen(stdout.fileno(), 'wb', closefd=False) as out:
plot_growth(df, path.basename(args.stats.name), counttype, out,
loc=args.legend_location,
estimate_growth=args.estimate_growth_params,
figsize=args.figsize)
elif command == 'hist':
df = pd.read_csv(args.stats, sep='\t', header=[1], index_col=[0])
n, m, non_cum_plots = get_subplot_dim(df)
# setup fancy plot look
sns.set_theme(style='darkgrid')
sns.set_color_codes('colorblind')
sns.set_palette('husl')
sns.despine(left=True, bottom=True)

f, axs = plt.subplots(n, m, figsize=(args.figsize[0] * m, args.figsize[1] * n))


if m == 1 and n == 1:
axs = np.array([[axs]]);
elif m == 1:
axs = axs.reshape(axs.size, 1)
elif n == 1:
axs = axs.reshape(1, axs.size)

for t in df.columns.levels[0]:
for j, c in enumerate(df.columns.levels[1]):
df_tc = df.loc[:, ids[t, c, :, :]]
if t == 'hist':
plot_hist(df_tc, axs[j, 0], loc=args.legend_location)
elif t == 'growth':
offset = 'hist' in df.columns.levels[0] and 1 or 0
axs_tc = axs[j, offset:offset+1]
if non_cum_plots:
axs_tc = axs[j, offset:offset+2]
plot_growth(df_tc, axs_tc, loc=args.legend_location, estimate_growth=args.estimate_growth_params)
elif t == 'ordered-growth':
if args.estimate_growth_params:
print(f'Cannot estimate growth using heaps law (-e parameter) when working with an ordered growth plot', file=stderr)
exit(1)
axs_tc = axs[j, -1:]
if non_cum_plots:
axs_tc = axs[j, -2:]
if df_tc.index[0] == '0' and df_tc.loc['0'].isna().all():
df_tc.drop(['0'], inplace=True)
plot_growth(df_tc, axs_tc, loc=args.legend_location, estimate_growth=False)

plt.tight_layout()
if not args.split_subfigures:
with fdopen(stdout.fileno(), 'wb', closefd=False) as out:
plot_hist(df, path.basename(args.stats.name), counttype, out,
loc=args.legend_location, figsize=args.figsize)
plt.savefig(out, format=args.format)
else:
print(f'This script cannot visualize the contents of input file {args.stats.name}, exiting.', file=stderr)
exit(1)

save_split_figures(axs, f, args.format, args.split_prefix)

plt.close()
5 changes: 5 additions & 0 deletions set-version.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

VERSION="$(cargo read-manifest | jq .version | sed "s/\"//g")"
sed -i'' -E "s#-[0-9]+\.[0-9]+\.[0-9]_#-${VERSION}_#g" README.md
sed -i'' -E "s#/[0-9]+\.[0-9]+\.[0-9]/#/${VERSION}/#g" README.md
622 changes: 0 additions & 622 deletions src/abacus.rs

This file was deleted.

40 changes: 40 additions & 0 deletions src/analyses.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
pub mod growth;
pub mod hist;
// pub mod histgrowth;
pub mod info;
pub mod ordered_histgrowth;
pub mod table;

use std::collections::HashSet;

use crate::{
analysis_parameter::AnalysisParameter, graph_broker::GraphBroker, html_report::AnalysisSection,
};

pub trait Analysis {
fn generate_table(&mut self, gb: Option<&GraphBroker>) -> anyhow::Result<String>;
fn generate_report_section(
&mut self,
gb: Option<&GraphBroker>,
) -> anyhow::Result<Vec<AnalysisSection>>;
fn get_graph_requirements(&self) -> HashSet<InputRequirement>;
fn get_type(&self) -> String;
}

pub trait ConstructibleAnalysis: Analysis {
fn from_parameter(parameter: AnalysisParameter) -> Self;
}

#[derive(PartialEq, Eq, Debug, Clone, Hash)]
pub enum InputRequirement {
Node,
Edge,
Bp,
PathLens,
Hist,
AbacusByGroup,
Graph(String),
// Subset(String),
// Grouping(String),
// Exclude(String),
}
256 changes: 256 additions & 0 deletions src/analyses/growth.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
use core::{panic, str};
use std::path::Path;
use std::{collections::HashSet, fs, io::BufReader};

use rayon::iter::{IntoParallelRefIterator, ParallelBridge, ParallelIterator};

use crate::analysis_parameter::AnalysisParameter;
use crate::graph_broker::{GraphBroker, Hist, ThresholdContainer};
use crate::html_report::ReportItem;
use crate::{
io::{parse_hists, write_table},
util::CountType,
};

use super::{Analysis, AnalysisSection, ConstructibleAnalysis, InputRequirement};

type Hists = Vec<Hist>;
type Growths = Vec<(CountType, Vec<Vec<f64>>)>;
type Comments = Vec<Vec<u8>>;

pub struct Growth {
parameter: AnalysisParameter,
inner: Option<InnerGrowth>,
}

impl Analysis for Growth {
fn get_type(&self) -> String {
"Growth".to_string()
}
fn generate_table(
&mut self,
dm: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<String> {
log::info!("reporting hist table");

self.set_inner(dm)?;
let growths = &self.inner.as_ref().unwrap().growths;
let hist_aux = &self.inner.as_ref().unwrap().hist_aux;
let comments = &self.inner.as_ref().unwrap().comments;
let mut res = String::new();
for c in comments {
res.push_str(str::from_utf8(&c[..])?);
res.push_str("\n");
}
res.push_str(&format!(
"# {}\n",
std::env::args().collect::<Vec<String>>().join(" ")
));

let mut header_cols = vec![vec![
"panacus".to_string(),
"count".to_string(),
"coverage".to_string(),
"quorum".to_string(),
]];
let mut output_columns: Vec<Vec<f64>> = Vec::new();

let hists = match &self.inner.as_ref().unwrap().hists {
Some(h) => h.iter().collect::<Vec<_>>(),
None => dm
.expect("Growth needs either hist file or graph")
.get_hists()
.values()
.collect::<Vec<_>>(),
};

if let AnalysisParameter::Growth { add_hist, .. } = self.parameter {
if add_hist {
for h in hists {
output_columns.push(h.coverage.iter().map(|x| *x as f64).collect());
header_cols.push(vec![
"hist".to_string(),
h.count.to_string(),
String::new(),
String::new(),
])
}
}
} else {
panic!("Growth needs growth parameter");
}

for (count, g) in growths {
output_columns.extend(g.clone());
let m = hist_aux.coverage.len();
header_cols.extend(
std::iter::repeat("growth")
.take(m)
.zip(std::iter::repeat(count).take(m))
.zip(hist_aux.coverage.iter())
.zip(&hist_aux.quorum)
.map(|(((p, t), c), q)| {
vec![p.to_string(), t.to_string(), c.get_string(), q.get_string()]
}),
);
}
res.push_str(&write_table(&header_cols, &output_columns)?);
Ok(res)
}

fn generate_report_section(
&mut self,
dm: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<Vec<AnalysisSection>> {
self.set_inner(dm)?;
let hist_aux = &self.inner.as_ref().unwrap().hist_aux;
let growth_labels = (0..hist_aux.coverage.len())
.map(|i| {
format!(
"coverage ≥ {}, quorum ≥ {}%",
hist_aux.coverage[i].get_string(),
hist_aux.quorum[i].get_string()
)
})
.collect::<Vec<_>>();
let table = self.generate_table(dm)?;
let table = format!("`{}`", &table);
let growths = &self.inner.as_ref().unwrap().growths;
let id_prefix = format!(
"pan-growth-{}",
self.get_run_name()
.to_lowercase()
.replace(&[' ', '|', '\\'], "-")
);
let growth_tabs = growths
.iter()
.map(|(k, v)| AnalysisSection {
id: format!("{id_prefix}-{k}"),
analysis: "Pangenome Growth".to_string(),
run_name: self.get_run_name(),
countable: k.to_string(),
table: Some(table.clone()),
items: vec![ReportItem::MultiBar {
id: format!("{id_prefix}-{k}"),
names: growth_labels.clone(),
x_label: "taxa".to_string(),
y_label: format!("#{}s", k),
labels: (1..v[0].len()).map(|i| i.to_string()).collect(),
values: v.clone(),
log_toggle: false,
}],
})
.collect();
Ok(growth_tabs)
}

// fn get_subcommand() -> Command {
// Command::new("growth")
// .about("Calculate growth curve from coverage histogram")
// .args(&[
// arg!(hist_file: <HIST_FILE> "Coverage histogram as tab-separated value (tsv) file"),
// arg!(-a --hist "Also include histogram in output"),
// Arg::new("coverage").help("Ignore all countables with a coverage lower than the specified threshold. The coverage of a countable corresponds to the number of path/walk that contain it. Repeated appearances of a countable in the same path/walk are counted as one. You can pass a comma-separated list of coverage thresholds, each one will produce a separated growth curve (e.g., --coverage 2,3). Use --quorum to set a threshold in conjunction with each coverage (e.g., --quorum 0.5,0.9)")
// .short('l').long("coverage").default_value("1"),
// Arg::new("quorum").help("Unlike the --coverage parameter, which specifies a minimum constant number of paths for all growth point m (1 <= m <= num_paths), --quorum adjust the threshold based on m. At each m, a countable is counted in the average growth if the countable is contained in at least floor(m*quorum) paths. Example: A quorum of 0.9 requires a countable to be in 90% of paths for each subset size m. At m=10, it must appear in at least 9 paths. At m=100, it must appear in at least 90 paths. A quorum of 1 (100%) requires presence in all paths of the subset, corresponding to the core. Default: 0, a countable counts if it is present in any path at each growth point. Specify multiple quorum values with a comma-separated list (e.g., --quorum 0.5,0.9). Use --coverage to set static path thresholds in conjunction with variable quorum percentages (e.g., --coverage 5,10).")
// .short('q').long("quorum").default_value("0"),
// ])
// }

fn get_graph_requirements(&self) -> HashSet<super::InputRequirement> {
if let AnalysisParameter::Growth { hist, .. } = &self.parameter {
if Path::new(&hist).exists() {
HashSet::new()
} else {
HashSet::from([InputRequirement::Hist])
}
} else {
panic!("Growth should always contain growth parameter")
}
}
}

impl ConstructibleAnalysis for Growth {
fn from_parameter(parameter: AnalysisParameter) -> Self {
Growth {
parameter,
inner: None,
}
}
}

impl Growth {
fn get_run_name(&self) -> String {
match &self.parameter {
AnalysisParameter::Growth { name, hist, .. } => {
if name.is_some() {
return name.as_ref().unwrap().to_string();
}
format!("for hist {}", hist)
}
_ => panic!("Hist analysis needs to contain hist parameter"),
}
}

fn set_inner(&mut self, gb: Option<&GraphBroker>) -> anyhow::Result<()> {
if self.inner.is_some() {
return Ok(());
}
if let AnalysisParameter::Growth {
coverage,
quorum,
hist,
..
} = &self.parameter
{
let quorum = quorum.to_owned().unwrap_or("0".to_string());
let coverage = coverage.to_owned().unwrap_or("1".to_string());
let hist_aux = ThresholdContainer::parse_params(&quorum, &coverage)?;

if gb.is_none() {
let hist_file = hist;
log::info!("loading coverage histogram from {}", hist_file);
let mut data = BufReader::new(fs::File::open(&hist_file)?);
let (coverages, comments) = parse_hists(&mut data)?;
let hists: Hists = coverages
.into_iter()
.map(|(count, coverage)| Hist { count, coverage })
.collect();
let growths: Growths = hists
.par_iter()
.map(|h| (h.count, h.calc_all_growths(&hist_aux)))
.collect();
self.inner = Some(InnerGrowth {
growths,
comments,
hist_aux,
hists: Some(hists),
});
} else {
let gb = gb.unwrap();
let growths: Growths = gb
.get_hists()
.values()
.par_bridge()
.map(|h| (h.count, h.calc_all_growths(&hist_aux)))
.collect();
self.inner = Some(InnerGrowth {
growths,
comments: Vec::new(),
hist_aux,
hists: None,
});
}
Ok(())
} else {
panic!("Growth should always contain growth parameter")
}
}
}

struct InnerGrowth {
growths: Growths,
comments: Comments,
hist_aux: ThresholdContainer,
hists: Option<Hists>,
}
160 changes: 160 additions & 0 deletions src/analyses/hist.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
use core::panic;
use std::collections::HashSet;

use crate::analysis_parameter::AnalysisParameter;
use crate::html_report::ReportItem;
use crate::{analyses::InputRequirement, io::write_table, util::CountType};

use super::{Analysis, AnalysisSection, ConstructibleAnalysis};

pub struct Hist {
parameter: AnalysisParameter,
}

impl Analysis for Hist {
fn get_type(&self) -> String {
"Hist".to_string()
}

fn generate_table(
&mut self,
gb: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<String> {
log::info!("reporting hist table");
if gb.is_none() {
panic!("Hist analysis needs a graph")
}
let gb = gb.unwrap();
let mut res = String::new();
res.push_str(&format!(
"# {}\n",
std::env::args().collect::<Vec<String>>().join(" ")
));

let mut header_cols = vec![vec![
"panacus".to_string(),
"count".to_string(),
String::new(),
String::new(),
]];
let mut output_columns = Vec::new();
for h in gb.get_hists().values() {
output_columns.push(h.coverage.iter().map(|x| *x as f64).collect());
header_cols.push(vec![
"hist".to_string(),
h.count.to_string(),
String::new(),
String::new(),
])
}
res.push_str(&write_table(&header_cols, &output_columns)?);
Ok(res)
}

fn generate_report_section(
&mut self,
gb: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<Vec<AnalysisSection>> {
if gb.is_none() {
panic!("Hist analysis needs a graph")
}
let gb = gb.unwrap();
let table = self.generate_table(Some(gb))?;
let table = format!("`{}`", &table);
let id_prefix = format!(
"cov-hist-{}",
self.get_run_name()
.to_lowercase()
.replace(&[' ', '|', '\\'], "-")
);
let histogram_tabs = gb
.get_hists()
.iter()
.map(|(k, v)| AnalysisSection {
id: format!("{id_prefix}-{k}"),
analysis: "Coverage Histogram".to_string(),
table: Some(table.clone()),
run_name: self.get_run_name(),
countable: k.to_string(),
items: vec![ReportItem::Bar {
id: format!("{id_prefix}-{k}"),
name: gb.get_fname(),
x_label: "taxa".to_string(),
y_label: format!("#{}s", k),
labels: (0..v.coverage.len()).map(|s| s.to_string()).collect(),
values: v.coverage.iter().map(|c| *c as f64).collect(),
log_toggle: true,
}],
})
.collect::<Vec<_>>();
Ok(histogram_tabs)
}

fn get_graph_requirements(&self) -> HashSet<super::InputRequirement> {
if let AnalysisParameter::Hist { count_type, .. } = &self.parameter {
let mut req = HashSet::from([InputRequirement::Hist]);
req.extend(Self::count_to_input_req(*count_type));
// if let Some(subset) = subset {
// req.insert(InputRequirement::Subset(subset.to_owned()));
// }
// if let Some(grouping) = grouping {
// req.insert(InputRequirement::Grouping(grouping.to_owned()));
// }
// if let Some(exclude) = exclude {
// req.insert(InputRequirement::Exclude(exclude.to_owned()));
// }
req
} else {
HashSet::new()
}
}
}

impl ConstructibleAnalysis for Hist {
fn from_parameter(parameter: AnalysisParameter) -> Self {
Self { parameter }
}
}

impl Hist {
fn count_to_input_req(count: CountType) -> HashSet<InputRequirement> {
match count {
CountType::Bp => HashSet::from([InputRequirement::Bp]),
CountType::Node => HashSet::from([InputRequirement::Node]),
CountType::Edge => HashSet::from([InputRequirement::Edge]),
CountType::All => HashSet::from([
InputRequirement::Bp,
InputRequirement::Node,
InputRequirement::Edge,
]),
}
}

fn get_run_name(&self) -> String {
match &self.parameter {
AnalysisParameter::Hist {
name,
graph,
subset,
exclude,
grouping,
..
} => {
if name.is_some() {
return name.as_ref().unwrap().to_string();
}
format!(
"{}-{}|{}\\{}",
graph,
match grouping.clone() {
Some(g) => g.to_string(),
None => "Ungrouped".to_string(),
},
subset.clone().unwrap_or_default(),
exclude.clone().unwrap_or_default()
)
}
_ => panic!("Hist analysis needs to contain hist parameter"),
}
}
}
581 changes: 581 additions & 0 deletions src/analyses/info.rs

Large diffs are not rendered by default.

223 changes: 223 additions & 0 deletions src/analyses/ordered_histgrowth.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
use std::collections::HashSet;

use rayon::iter::{IndexedParallelIterator, IntoParallelRefIterator, ParallelIterator};

use crate::analysis_parameter::AnalysisParameter;
use crate::graph_broker::{GraphBroker, ThresholdContainer};
use crate::html_report::ReportItem;
use crate::util::CountType;
use crate::{analyses::InputRequirement, io::write_ordered_histgrowth_table};

use super::{Analysis, AnalysisSection, ConstructibleAnalysis};

type Growths = Vec<Vec<f64>>;

pub struct OrderedHistgrowth {
parameter: AnalysisParameter,
inner: Option<InnerOrderedGrowth>,
}

const MAX_WIDTH: usize = 25;

impl ConstructibleAnalysis for OrderedHistgrowth {
fn from_parameter(parameter: AnalysisParameter) -> Self {
Self {
parameter,
inner: None,
}
}
}

impl Analysis for OrderedHistgrowth {
fn get_type(&self) -> String {
"OrderedHistgrowth".to_string()
}
fn generate_table(
&mut self,
gb: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<String> {
if let Some(gb) = gb {
write_ordered_histgrowth_table(
gb.get_abacus_by_group(),
&self.inner.as_ref().unwrap().hist_aux,
gb.get_node_lens(),
)
} else {
Ok("".to_string())
}
}

fn generate_report_section(
&mut self,
dm: Option<&GraphBroker>,
) -> anyhow::Result<Vec<AnalysisSection>> {
self.set_inner(dm)?;
let count = match self.parameter {
AnalysisParameter::OrderedGrowth { count_type, .. } => count_type,
_ => {
panic!("Parameter has to fit the analysis")
}
};
let hist_aux = &self.inner.as_ref().unwrap().hist_aux;
let growth_labels = (0..hist_aux.coverage.len())
.map(|i| {
format!(
"coverage ≥ {}, quorum ≥ {}%",
hist_aux.coverage[i].get_string(),
hist_aux.quorum[i].get_string()
)
})
.collect::<Vec<_>>();
let table = self.generate_table(dm)?;
let table = format!("`{}`", &table);
let growths = &self.inner.as_ref().unwrap().growths;
let id_prefix = format!(
"pan-ordered-growth-{}",
self.get_run_name()
.to_lowercase()
.replace(&[' ', '|', '\\'], "-")
);
let growth_tabs = vec![AnalysisSection {
id: format!("{id_prefix}"),
analysis: "Pangenome Growth".to_string(),
run_name: self.get_run_name(),
countable: count.to_string(),
table: Some(table.clone()),
items: vec![ReportItem::MultiBar {
id: format!("{id_prefix}"),
names: growth_labels.clone(),
x_label: "taxa".to_string(),
y_label: format!("{}s", count),
labels: (1..growths[0].len()).map(|i| i.to_string()).collect(),
values: growths.clone(),
log_toggle: false,
}],
}];
Ok(growth_tabs)
//let mut growths: Vec<Vec<f64>> = self
// .hist_aux
// .coverage
// .par_iter()
// .zip(&self.hist_aux.quorum)
// .map(|(c, q)| {
// log::info!(
// "calculating ordered growth for coverage >= {} and quorum >= {}",
// &c,
// &q
// );
// gb.get_abacus_by_group()
// .calc_growth(c, q, gb.get_node_lens())
// })
// .collect();
//// insert empty row for 0 element
//for c in &mut growths {
// c.insert(0, f64::NAN);
//}
//let table = self.generate_table(Some(gb)).expect("Can write to string");
//let k = gb.get_abacus_by_group().count;
//Ok(vec![
//])
}

fn get_graph_requirements(&self) -> HashSet<InputRequirement> {
if let AnalysisParameter::OrderedGrowth { count_type, .. } = &self.parameter {
let mut req = HashSet::from([InputRequirement::AbacusByGroup]);
req.extend(Self::count_to_input_req(*count_type));
req
} else {
HashSet::new()
}
}
}

impl OrderedHistgrowth {
fn truncate(input: &str) -> String {
let res: String = input.chars().rev().take(MAX_WIDTH).collect();
let res: String = res.chars().rev().collect();
if res.len() < input.len() {
format!("...{}", res)
} else {
res
}
}

fn count_to_input_req(count: CountType) -> HashSet<InputRequirement> {
match count {
CountType::Bp => HashSet::from([InputRequirement::Bp]),
CountType::Node => HashSet::from([InputRequirement::Node]),
CountType::Edge => HashSet::from([InputRequirement::Edge]),
CountType::All => HashSet::from([
InputRequirement::Bp,
InputRequirement::Node,
InputRequirement::Edge,
]),
}
}

fn get_run_name(&self) -> String {
match &self.parameter {
AnalysisParameter::OrderedGrowth {
name, count_type, ..
} => {
if name.is_some() {
return name.as_ref().unwrap().to_string();
}
format!(
"{}|{}",
Self::truncate(&self.inner.as_ref().unwrap().graph),
count_type
)
}
_ => panic!("Hist analysis needs to contain hist parameter"),
}
}

fn set_inner(&mut self, gb: Option<&GraphBroker>) -> anyhow::Result<()> {
if self.inner.is_some() {
return Ok(());
}

if let AnalysisParameter::OrderedGrowth {
coverage, quorum, ..
} = &self.parameter
{
let quorum = quorum.to_owned().unwrap_or("0".to_string());
let coverage = coverage.to_owned().unwrap_or("1".to_string());
let hist_aux = ThresholdContainer::parse_params(&quorum, &coverage)?;

if gb.is_none() {
panic!("OrderedHistgrowth needs a graph in order to work");
}

let growths: Vec<Vec<f64>> = hist_aux
.coverage
.par_iter()
.zip(&hist_aux.quorum)
.map(|(c, q)| {
log::info!(
"calculating ordered growth for coverage >= {} and quorum >= {}",
&c,
&q
);
gb.unwrap()
.get_abacus_by_group()
.calc_growth(c, q, gb.unwrap().get_node_lens())
})
.collect();
self.inner = Some(InnerOrderedGrowth {
growths,
hist_aux,
graph: gb.unwrap().get_fname(),
});
Ok(())
} else {
panic!("OrderedGrowth should always contain ordered-growth parameter")
}
}
}

struct InnerOrderedGrowth {
growths: Growths,
hist_aux: ThresholdContainer,
graph: String,
}
73 changes: 73 additions & 0 deletions src/analyses/table.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
use crate::{analyses::InputRequirement, analysis_parameter::AnalysisParameter, util::CountType};
use std::{collections::HashSet, io::BufWriter};

use super::{Analysis, AnalysisSection, ConstructibleAnalysis};

pub struct Table {
parameter: AnalysisParameter,
}

impl Analysis for Table {
fn generate_table(
&mut self,
gb: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<String> {
if let Some(gb) = gb {
let total = match self.parameter {
AnalysisParameter::Table { total, .. } => total,
_ => {
panic!("Table analysis needs a table parameter")
}
};
let mut buf = BufWriter::new(Vec::new());
gb.write_abacus_by_group(total, &mut buf)?;
let bytes = buf.into_inner()?;
let string = String::from_utf8(bytes)?;
Ok(string)
} else {
Ok("".to_string())
}
}

fn get_type(&self) -> String {
"Table".to_string()
}

fn get_graph_requirements(&self) -> HashSet<InputRequirement> {
if let AnalysisParameter::Table { count_type, .. } = &self.parameter {
let mut req = HashSet::from([InputRequirement::AbacusByGroup]);
req.extend(Self::count_to_input_req(*count_type));
req
} else {
HashSet::new()
}
}

fn generate_report_section(
&mut self,
_dm: Option<&crate::graph_broker::GraphBroker>,
) -> anyhow::Result<Vec<AnalysisSection>> {
Ok(Vec::new())
}
}

impl ConstructibleAnalysis for Table {
fn from_parameter(parameter: crate::analysis_parameter::AnalysisParameter) -> Self {
Table { parameter }
}
}

impl Table {
fn count_to_input_req(count: CountType) -> HashSet<InputRequirement> {
match count {
CountType::Bp => HashSet::from([InputRequirement::Bp]),
CountType::Node => HashSet::from([InputRequirement::Node]),
CountType::Edge => HashSet::from([InputRequirement::Edge]),
CountType::All => HashSet::from([
InputRequirement::Bp,
InputRequirement::Node,
InputRequirement::Edge,
]),
}
}
}
220 changes: 220 additions & 0 deletions src/analysis_parameter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
use core::panic;
use std::fmt::Display;

use serde::{Deserialize, Serialize};

use crate::util::CountType;

#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash, Clone)]
pub enum AnalysisParameter {
Hist {
name: Option<String>,

#[serde(default)]
count_type: CountType,
graph: String,

#[serde(default = "get_true")]
display: bool,

subset: Option<String>,
exclude: Option<String>,
grouping: Option<Grouping>,
},
Growth {
name: Option<String>,

coverage: Option<String>,
quorum: Option<String>,

hist: String,

#[serde(default = "get_true")]
display: bool,

#[serde(default)]
add_hist: bool,
},
Subset {
name: String,
file: String,
},
Graph {
name: String,
file: String,
#[serde(default)]
nice: bool,
},
Table {
#[serde(default)]
count_type: CountType,
graph: String,

subset: Option<String>,
exclude: Option<String>,
grouping: Option<Grouping>,
total: bool,
order: Option<String>,
},
Info {
graph: String,
subset: Option<String>,
exclude: Option<String>,
grouping: Option<Grouping>,
},
OrderedGrowth {
name: Option<String>,

coverage: Option<String>,
quorum: Option<String>,

order: Option<String>,

#[serde(default)]
count_type: CountType,
graph: String,

#[serde(default = "get_true")]
display: bool,

subset: Option<String>,
exclude: Option<String>,
grouping: Option<Grouping>,
},
}

#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash, Clone, PartialOrd, Ord)]
pub enum Grouping {
Sample,
Haplotype,
Custom(String),
}

impl Display for Grouping {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::Sample => write!(f, "Group By Sample"),
Self::Haplotype => write!(f, "Group By Haplotype"),
Self::Custom(file) => write!(f, "Group By {}", file),
}
}
}

fn get_true() -> bool {
true
}

impl AnalysisParameter {
fn compare_hists(&self, other: &Self) -> std::cmp::Ordering {
match self {
AnalysisParameter::Hist {
name,
count_type,
graph,
display,
subset,
exclude,
grouping,
} => match other {
AnalysisParameter::Hist {
name: o_name,
count_type: o_count_type,
graph: o_graph,
display: o_display,
subset: o_subset,
exclude: o_exclude,
grouping: o_grouping,
} => {
if graph != o_graph {
return graph.cmp(o_graph);
} else if subset != o_subset {
return subset.cmp(o_subset);
} else if exclude != o_exclude {
return exclude.cmp(o_exclude);
} else if grouping != o_grouping {
return grouping.cmp(o_grouping);
} else if count_type != o_count_type {
return count_type.cmp(o_count_type);
} else if name != o_name {
return name.cmp(o_name);
} else if display != o_display {
return display.cmp(o_display);
} else {
return std::cmp::Ordering::Equal;
}
}
_ => panic!("Method only defined for hists"),
},
_ => panic!("Method only defined for hists"),
}
}
}

impl PartialOrd for AnalysisParameter {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}

impl Ord for AnalysisParameter {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
match self {
AnalysisParameter::Hist { .. } => match other {
o @ AnalysisParameter::Hist { .. } => self.compare_hists(o),
_ => std::cmp::Ordering::Greater,
},
AnalysisParameter::Graph { name, .. } => match other {
AnalysisParameter::Graph { name: o_name, .. } => name.cmp(o_name),
_ => std::cmp::Ordering::Less,
},
AnalysisParameter::Subset { name, .. } => match other {
AnalysisParameter::Subset { name: o_name, .. } => name.cmp(o_name),
AnalysisParameter::Graph { .. } => std::cmp::Ordering::Greater,
_ => std::cmp::Ordering::Less,
},
//AnalysisParameter::Grouping { name, .. } => match other {
// AnalysisParameter::Grouping { name: o_name, .. } => name.cmp(o_name),
// AnalysisParameter::Graph { .. } | AnalysisParameter::Subset { .. } => {
// std::cmp::Ordering::Greater
// }
// _ => std::cmp::Ordering::Less,
//},
AnalysisParameter::Info {
graph,
subset,
exclude,
grouping,
} => match other {
AnalysisParameter::Info { graph: o_graph, subset: o_subset, exclude: o_exclude, grouping: o_grouping } => {
graph.cmp(o_graph).then(subset.cmp(o_subset)).then(exclude.cmp(o_exclude)).then(grouping.cmp(o_grouping))
},
AnalysisParameter::Graph { .. }
//| AnalysisParameter::Grouping { .. }
| AnalysisParameter::Subset { .. } => std::cmp::Ordering::Greater,
_ => std::cmp::Ordering::Less,
},
AnalysisParameter::Table { .. } => match other {
AnalysisParameter::Table { .. } => std::cmp::Ordering::Equal,
AnalysisParameter::Graph { .. }
| AnalysisParameter::Subset { .. }
//| AnalysisParameter::Grouping { .. }
| AnalysisParameter::Info { .. } => std::cmp::Ordering::Greater,
_ => std::cmp::Ordering::Less,
},
AnalysisParameter::OrderedGrowth { name, .. } => match other {
AnalysisParameter::OrderedGrowth { name: o_name, .. } => name.cmp(o_name),
AnalysisParameter::Graph { .. }
| AnalysisParameter::Subset { .. }
//| AnalysisParameter::Grouping { .. }
| AnalysisParameter::Info { .. }
| AnalysisParameter::Table { .. } => std::cmp::Ordering::Greater,
_ => std::cmp::Ordering::Less,
},
AnalysisParameter::Growth { hist, .. } => match other {
AnalysisParameter::Growth { hist: o_hist, .. } => hist.cmp(o_hist),
AnalysisParameter::Hist { .. } => std::cmp::Ordering::Less,
_ => std::cmp::Ordering::Greater,
},
}
}
}
Loading