Skip to content

Commit

Permalink
Add docs #4, clear quick annotation
Browse files Browse the repository at this point in the history
  • Loading branch information
antonylebechec committed Mar 4, 2024
1 parent a523988 commit f821f3b
Show file tree
Hide file tree
Showing 176 changed files with 877 additions and 539 deletions.
64 changes: 38 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
<style>body {text-align: justify}</style>

# HOWARD

![HOWARD Graphical User Interface](images/icon.png "HOWARD Graphical User Interface")

Highly Open and Valuable tool for Variant Annotation & Ranking for Discovery

HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, translates files in multiple formats (e.g. vcf, tsv, parquet) and generates variants statistics.
Expand All @@ -22,13 +26,16 @@ HOWARD is multithreaded through the number of variants and by database (data-sca
- [Python](#python)
- [Docker](#docker)
- [Databases](#databases)
- [Configuration](#configuration)
- [Parameters](#parameters)
- [Tools](#tools)
- [Stats](#stats)
- [Convert](#convert)
- [Query](#query)
- [Annotation](#annotation)
- [Calculation](#calculation)
- [Prioritization](#prioritization)
- [Process](#process)
- [Docker HOWARD-CLI](#docker-howard-cli)
- [Documentation](#documentation)
- [Contact](#contact)
Expand All @@ -51,8 +58,6 @@ howard gui

![HOWARD Graphical User Interface](images/howard-gui.png "HOWARD Graphical User Interface")



## Docker

In order to build, setup and create a persitent CLI (running container with all useful external tools such as [BCFTools](https://samtools.github.io/bcftools/), [snpEff](https://pcingola.github.io/SnpEff/), [Annovar](https://annovar.openbioinformatics.org/), [Exomiser](https://www.sanger.ac.uk/tool/exomiser/)), docker-compose command build images and launch services as containers.
Expand All @@ -74,7 +79,6 @@ docker exec -ti HOWARD-CLI bash
howard --help
```


## Databases

Multiple databases can be automatically downloaded with databases tool, such as:
Expand Down Expand Up @@ -105,6 +109,18 @@ Databases can be home-made generated, starting with a existing annotation file,

Each database annotation file is associated with a 'header' file ('.hdr'), in VCF header format, to describe annotations within the database.

# Configuration

HOWARD Configuration JSON file defined default configuration regarding resources (e.g. threads, memory), settings (e.g. verbosity, temporary files), default folders (e.g. for databases) and paths to external tools.

See [HOWARD Configuration JSON](docs/help.config.md) for more information.

# Parameters

HOWARD Parameters JSON file defined parameters to process annotations, prioritization, calculations, convertions and queries.

See [HOWARD Parameters JSON](docs/help.param.md) for more information.

# Tools

## Stats
Expand Down Expand Up @@ -145,25 +161,21 @@ See [HOWARD Help Query tool](docs/help.md#query-tool) for more options.
## Annotation
Annotation is mainly based on a build-in Parquet annotation method, and tools such as BCFTOOLS, Annovar, snpEff and Exomiser. It uses available databases and homemade databases. Format of databases are: Parquet/duckdb, VCF, BED, Annovar and snpEff (Annovar and snpEff databases are automatically downloaded, see howard databases tool).
- VCF annotation with Parquet and VCF databases, output as VCF format
```
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/databases/annotations/hg19/dbnsfp42a.parquet,tests/databases/annotations/hg19/gnomad211_genome.parquet,tests/databases/annotations/hg19/cosmic70.vcf.gz'
```
Annotation is mainly based on a build-in Parquet annotation method, using database format such as Parquet, duckdb, VCF, BED, TSV, JSON. External annotation tools are also available, such as BCFTOOLS, Annovar, snpEff and Exomiser. It uses available databases and homemade databases. Annovar and snpEff databases are automatically downloaded (see [HOWARD Help Databases tool](docs/help.md#databases-tool)). All annotation parameters are defined in [HOWARD Parameters JSON](docs/help.param.md) file.
- VCF annotation with Clinvar Parquet, Annovar refGene and snpEff databases, output as TSV format
Quick annotation allows to annotates by simply listing annotation databases, or defining external tools keywords. These annotations can be combined.
```
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --annotations='annovar:refGene,snpeff,tests/databases/annotations/hg19/clinvar_20210123.parquet'
```
- VCF annotation with all available database annotation files in Parquet format (within the database annotation folder in configuration):
> Example: VCF annotation with Parquet and VCF databases, output as VCF format
>
> ```
> howard annotation --input=tests/data/example.vcf.gz --annotations='tests/databases/annotations/current/hg19/dbnsfp42a.parquet,tests/databases/annotations/current/hg19/cosmic70.vcf.gz' --output=/tmp/example.howard.vcf.gz
> ```
```
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --assembly='hg19' --annotations='ALL:parquet'
```
> Example: VCF annotation with external tools (Annovar refGene and snpEff databases), output as TSV format
>
> ```
> howard annotation --input=tests/data/example.vcf.gz --annotations='annovar:refGene,snpeff' --output=/tmp/example.howard.tsv
> ```
See [HOWARD Help Annotation tool](docs/help.md#annotation-tool) for more options.
Expand All @@ -174,7 +186,7 @@ Calculation processes variants information to generate new information, such as:
- Identify variant types
```
howard calculation --input=tests/data/example.full.vcf --output=/tmp/example.calculation.tsv --calculations='vartype'
howard calculation --input=tests/data/example.full.vcf --calculations='vartype' --output=/tmp/example.calculation.tsv
```
- and generate a table of variant type count
Expand All @@ -186,7 +198,7 @@ howard query --input=/tmp/example.calculation.tsv --explode_infos --query='SELEC
- Calculate NOMEN by extracting hgvs from snpEff annotation and identifying default transcripts from a list
```
howard calculation --input=tests/data/example.ann.vcf.gz --output=/tmp/example.NOMEN.vcf.gz --calculations='snpeff_hgvs,NOMEN' --hgvs_field='snpeff_hgvs' --transcripts=tests/data/transcripts.tsv && gzip -dc /tmp/example.NOMEN.vcf.gz | grep "##" -v | head -n2
howard calculation --input=tests/data/example.ann.vcf.gz --calculations='snpeff_hgvs,NOMEN' --hgvs_field='snpeff_hgvs' --transcripts=tests/data/transcripts.tsv --output=/tmp/example.NOMEN.vcf.gz && gzip -dc /tmp/example.NOMEN.vcf.gz | grep "##" -v | head -n2
```
- and query NOMEN for gene 'EGFR'
Expand All @@ -204,7 +216,7 @@ Prioritization algorithm uses profiles to flag variants (as passed or filtered),
- Prioritize variants from criteria on INFO annotations for profiles 'default' and 'GERMLINE' (see 'prioritization_profiles.json'), and query variants on prioritization tags
```
howard prioritization --input=tests/data/example.vcf.gz --output=/tmp/example.prioritized.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles='default,GERMLINE' --pzfields='PZFlag,PZScore,PZComment'
howard prioritization --input=tests/data/example.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles='default,GERMLINE' --pzfields='PZFlag,PZScore,PZComment' --output=/tmp/example.prioritized.vcf.gz
```
- and query variants passing filters
Expand Down Expand Up @@ -280,20 +292,20 @@ howard process --config=config/config.json --param=config/param.json --input=tes
},
"parquet": {
"annotations": {
"tests/databases/annotations/hg19/avsnp150.parquet": {
"tests/databases/annotations/current/hg19/avsnp150.parquet": {
"INFO": null
},
"tests/databases/annotations/hg19/dbnsfp42a.parquet": {
"tests/databases/annotations/current/hg19/dbnsfp42a.parquet": {
"INFO": null
},
"tests/databases/annotations/hg19/gnomad211_genome.parquet": {
"tests/databases/annotations/current/hg19/gnomad211_genome.parquet": {
"INFO": null
}
}
},
"bcftools": {
"annotations": {
"tests/databases/annotations/hg19/cosmic70.vcf.gz": {
"tests/databases/annotations/current/hg19/cosmic70.vcf.gz": {
"INFO": null
}
}
Expand Down
8 changes: 4 additions & 4 deletions config/param.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,20 @@
},
"parquet": {
"annotations": {
"tests/databases/annotations/hg19/avsnp150.parquet": {
"tests/databases/annotations/current/hg19/avsnp150.parquet": {
"INFO": null
},
"tests/databases/annotations/hg19/dbnsfp42a.parquet": {
"tests/databases/annotations/current/hg19/dbnsfp42a.parquet": {
"INFO": null
},
"tests/databases/annotations/hg19/gnomad211_genome.parquet": {
"tests/databases/annotations/current/hg19/gnomad211_genome.parquet": {
"INFO": null
}
}
},
"bcftools": {
"annotations": {
"tests/databases/annotations/hg19/cosmic70.vcf.gz": {
"tests/databases/annotations/current/hg19/cosmic70.vcf.gz": {
"INFO": null
}
}
Expand Down
8 changes: 4 additions & 4 deletions docs/help.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<H1>HOWARD Help</h1>
<p>HOWARD:1.0.0<br>Highly Open and Valuable tool for Variant Annotation & Ranking for Discovery<br>HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, convert on multiple formats, query variations and generates statistics</p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard process --input=tests/data/example.vcf.gz --output=/tmp/example.annotated.vcf.gz --param=config/param.json <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/databases/annotations/hg19/dbnsfp42a.parquet,tests/databases/annotations/hg19/gnomad211_genome.parquet' <br>&nbsp;&nbsp;&nbsp;howard calculation --input=tests/data/example.full.vcf --output=/tmp/example.calculation.tsv --calculations='vartype' <br>&nbsp;&nbsp;&nbsp;howard prioritization --input=tests/data/example.vcf.gz --output=/tmp/example.prioritized.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles='default,GERMLINE' <br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --explode_infos --query='SELECT "#CHROM", POS, REF, ALT, "DP", "CLNSIG", sample2, sample3 FROM variants WHERE "DP" >= 50 OR "CLNSIG" NOT NULL ORDER BY "CLNSIG" DESC, "DP" DESC' <br>&nbsp;&nbsp;&nbsp;howard stats --input=tests/data/example.vcf.gz <br>&nbsp;&nbsp;&nbsp;howard convert --input=tests/data/example.vcf.gz --output=/tmp/example.tsv --explode_infos && cat /tmp/example.tsv <br><H2>QUERY</H2>
<p>Query genetic variations in SQL format. Data can be loaded into 'variants' table from various formats (e.g. VCF, TSV, Parquet...). Using --explode_infos allow query on INFO/tag annotations. SQL query can also use external data within the request, such as a Parquet file(s). </p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --query="SELECT * FROM variants WHERE REF = 'A' AND POS < 100000" <br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --explode_infos --query='SELECT "#CHROM", POS, REF, ALT, DP, CLNSIG, sample2, sample3 FROM variants WHERE DP >= 50 OR CLNSIG NOT NULL ORDER BY DP DESC' <br>&nbsp;&nbsp;&nbsp;howard query --query="SELECT \"#CHROM\", POS, REF, ALT, \"INFO/Interpro_domain\" FROM 'tests/databases/annotations/hg19/dbnsfp42a.parquet' WHERE \"INFO/Interpro_domain\" NOT NULL ORDER BY \"INFO/SiPhy_29way_logOdds_rankscore\" DESC LIMIT 10" <br>&nbsp;&nbsp;&nbsp;howard query --explode_infos --explode_infos_prefix='INFO/' --query="SELECT \"#CHROM\", POS, REF, ALT, STRING_AGG(INFO, ';') AS INFO FROM 'tests/databases/annotations/hg19/*.parquet' GROUP BY \"#CHROM\", POS, REF, ALT" --output=/tmp/full_annotation.tsv && head -n2 /tmp/full_annotation.tsv <br><H3>Main options</H3>
<p>HOWARD:1.0.0<br>Highly Open and Valuable tool for Variant Annotation & Ranking for Discovery<br>HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, convert on multiple formats, query variations and generates statistics</p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard process --input=tests/data/example.vcf.gz --output=/tmp/example.annotated.vcf.gz --param=config/param.json <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/databases/annotations/current/hg19/dbnsfp42a.parquet,tests/databases/annotations/current/hg19/gnomad211_genome.parquet' <br>&nbsp;&nbsp;&nbsp;howard calculation --input=tests/data/example.full.vcf --output=/tmp/example.calculation.tsv --calculations='vartype' <br>&nbsp;&nbsp;&nbsp;howard prioritization --input=tests/data/example.vcf.gz --output=/tmp/example.prioritized.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles='default,GERMLINE' <br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --explode_infos --query='SELECT "#CHROM", POS, REF, ALT, "DP", "CLNSIG", sample2, sample3 FROM variants WHERE "DP" >= 50 OR "CLNSIG" NOT NULL ORDER BY "CLNSIG" DESC, "DP" DESC' <br>&nbsp;&nbsp;&nbsp;howard stats --input=tests/data/example.vcf.gz <br>&nbsp;&nbsp;&nbsp;howard convert --input=tests/data/example.vcf.gz --output=/tmp/example.tsv --explode_infos && cat /tmp/example.tsv <br><H2>QUERY</H2>
<p>Query genetic variations in SQL format. Data can be loaded into 'variants' table from various formats (e.g. VCF, TSV, Parquet...). Using --explode_infos allow query on INFO/tag annotations. SQL query can also use external data within the request, such as a Parquet file(s). </p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --query="SELECT * FROM variants WHERE REF = 'A' AND POS < 100000" <br>&nbsp;&nbsp;&nbsp;howard query --input=tests/data/example.vcf.gz --explode_infos --query='SELECT "#CHROM", POS, REF, ALT, DP, CLNSIG, sample2, sample3 FROM variants WHERE DP >= 50 OR CLNSIG NOT NULL ORDER BY DP DESC' <br>&nbsp;&nbsp;&nbsp;howard query --query="SELECT \"#CHROM\", POS, REF, ALT, \"INFO/Interpro_domain\" FROM 'tests/databases/annotations/current/hg19/dbnsfp42a.parquet' WHERE \"INFO/Interpro_domain\" NOT NULL ORDER BY \"INFO/SiPhy_29way_logOdds_rankscore\" DESC LIMIT 10" <br>&nbsp;&nbsp;&nbsp;howard query --explode_infos --explode_infos_prefix='INFO/' --query="SELECT \"#CHROM\", POS, REF, ALT, STRING_AGG(INFO, ';') AS INFO FROM 'tests/databases/annotations/current/hg19/*.parquet' GROUP BY \"#CHROM\", POS, REF, ALT" --output=/tmp/full_annotation.tsv && head -n2 /tmp/full_annotation.tsv <br><H3>Main options</H3>
<pre>--input=&lt;input&gt;
Input file path
Format: BCF, VCF, TSV, CSV, PSV, Parquet or duckDB
Expand Down Expand Up @@ -125,7 +125,7 @@ <H1>HOWARD Help</h1>
default: None

</pre><H2>ANNOTATION</H2>
<p>Annotation is mainly based on a build-in Parquet annotation method, and tools such as BCFTOOLS, Annovar and snpEff. It uses available databases (see Annovar and snpEff) and homemade databases. Format of databases are: parquet, duckdb, vcf, bed, Annovar and snpEff (Annovar and snpEff databases are automatically downloaded, see howard databases tool). </p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/databases/annotations/hg19/avsnp150.parquet,tests/databases/annotations/hg19/dbnsfp42a.parquet,tests/databases/annotations/hg19/gnomad211_genome.parquet' <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --assembly=hg19 --annotations='annovar:refGene,annovar:cosmic70,snpeff,tests/databases/annotations/hg19/clinvar_20210123.parquet' <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --assembly=hg19 --annotations='ALL:parquet' <br><H3>Main options</H3>
<p>Annotation is mainly based on a build-in Parquet annotation method, and tools such as BCFTOOLS, Annovar and snpEff. It uses available databases (see Annovar and snpEff) and homemade databases. Format of databases are: parquet, duckdb, vcf, bed, Annovar and snpEff (Annovar and snpEff databases are automatically downloaded, see howard databases tool). </p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/databases/annotations/current/hg19/avsnp150.parquet,tests/databases/annotations/current/hg19/dbnsfp42a.parquet,tests/databases/annotations/current/hg19/gnomad211_genome.parquet' <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --assembly=hg19 --annotations='annovar:refGene,annovar:cosmic70,snpeff,tests/databases/annotations/current/hg19/clinvar_20210123.parquet' <br>&nbsp;&nbsp;&nbsp;howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --assembly=hg19 --annotations='ALL:parquet' <br><H3>Main options</H3>
<pre>--input=&lt;input&gt; | required
Input file path
Format: BCF, VCF, TSV, CSV, PSV, Parquet or duckDB
Expand Down Expand Up @@ -675,7 +675,7 @@ <H1>HOWARD Help</h1>

</pre><H2>GUI</H2>
<p>Graphical User Interface tools</p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard gui <H2>HELP</H2>
<p>Help tools</p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard help --help_md=/tmp/howard.help.md --help_html=docs/help.html<br>&nbsp;&nbsp;&nbsp;howard help --help_json_input=docs/help.config.json --help_json_input_title='HOWARD Configuration' --help_md=docs/help.config.md --help_html=docs/help.config.html<br>&nbsp;&nbsp;&nbsp;howard help --help_json_input=docs/help.param.json --help_json_input_title='HOWARD Parameters' --help_md=docs/help.param.md --help_html=docs/help.param.html <H3>Main options</H3>
<p>Help tools</p>Usage examples:<br>&nbsp;&nbsp;&nbsp;howard help --help_md=docs/help.md --help_html=docs/help.html<br>&nbsp;&nbsp;&nbsp;howard help --help_json_input=docs/help.config.json --help_json_input_title='HOWARD Configuration' --help_md=docs/help.config.md --help_html=docs/help.config.html<br>&nbsp;&nbsp;&nbsp;howard help --help_json_input=docs/help.param.json --help_json_input_title='HOWARD Parameters' --help_md=docs/help.param.md --help_html=docs/help.param.html <H3>Main options</H3>
<pre>--help_md=&lt;help markdown&gt;
Help Output file in MarkDown format

Expand Down
Loading

0 comments on commit f821f3b

Please sign in to comment.