Skip to content

Commit

Permalink
add Database object #44, fix #45 #4
Browse files Browse the repository at this point in the history
  • Loading branch information
antonylebechec committed Jun 1, 2023
1 parent 6f219f9 commit 78aba65
Show file tree
Hide file tree
Showing 42 changed files with 1,521 additions and 57 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ ENV TOOLS=/tools
ENV DATA=/data
ENV TOOL=/tool
ENV DATABASES=/databases
ENV YUM_INSTALL="gcc bc make wget perl-devel which zlib-devel zlib bzip2-devel bzip2 xz-devel xz ncurses-devel unzip curl-devel python39 java-11 htop"
ENV YUM_INSTALL="gcc bc make wget perl-devel which zlib-devel zlib bzip2-devel bzip2 xz-devel xz ncurses-devel unzip curl-devel python39 java-17 htop"
ENV YUM_REMOVE="zlib-devel bzip2-devel xz-devel ncurses-devel gcc"


Expand Down
27 changes: 21 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,22 @@ docker-compose up -d

A Command Line Interface container (HOWARD-CLI) is started with host data and databases folders mounted (by default in ${HOME}/HOWARD folder)

# Databases

Databases such as Annovar and snpEff can be downloaded with databases tool.

- Download Annovar databases for assembly 'hg19':
```
howard databases --assembly='hg19' --download-annovar=/databases/annovar/current --download-annovar-files='refGene,gnomad_exome,dbnsfp42a,cosmic70,clinvar_202*,nci60'
```

- Download snpEff databases for assembly 'hg38':
```
howard databases --assembly='hg38' --download-snpeff=/databases/annovar/current
```



# Quick HOWARD commands

## Stats
Expand Down Expand Up @@ -112,12 +128,12 @@ Annotation is mainly based on a build-in Parquet annotation method, and tools su

- VCF annotation with Parquet and VCF databases, output as VCF format
```
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations=tests/data/annotations/dbnsfp42a.parquet,tests/data/annotations/gnomad211_genome.parquet,tests/data/annotations/cosmic70.vcf.gz
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.vcf.gz --annotations='tests/data/annotations/dbnsfp42a.parquet,tests/data/annotations/gnomad211_genome.parquet,tests/data/annotations/cosmic70.vcf.gz'
```

- VCF annotation with Clinvar Parquet, Annovar refGene and snpEff databases, output as TSV format
```
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --annotations=annovar:refGene,snpeff,tests/data/annotations/clinvar_20210123.parquet
howard annotation --input=tests/data/example.vcf.gz --output=/tmp/example.howard.tsv --annotations='annovar:refGene,snpeff,tests/data/annotations/clinvar_20210123.parquet'
```

## Calculation
Expand All @@ -126,7 +142,7 @@ Calculation processes variants information to generate new information, such as:

- Identify variant types
```
howard calculation --input=tests/data/example.full.vcf --output=/tmp/example.calculation.tsv --calculations=vartype
howard calculation --input=tests/data/example.full.vcf --output=/tmp/example.calculation.tsv --calculations='vartype'
```
- and generate a table of variant type count
```
Expand All @@ -135,7 +151,7 @@ howard query --input=/tmp/example.calculation.tsv --explode_infos --query='SELEC

- Calculate NOMEN by extracting hgvs from snpEff annotation and identifying default transcripts from a list
```
howard calculation --input=tests/data/example.ann.vcf.gz --output=/tmp/example.NOMEN.vcf.gz --calculations=snpeff_hgvs,NOMEN --hgvs_field=snpeff_hgvs --transcripts=tests/data/transcripts.tsv
howard calculation --input=tests/data/example.ann.vcf.gz --output=/tmp/example.NOMEN.vcf.gz --calculations='snpeff_hgvs,NOMEN' --hgvs_field='snpeff_hgvs' --transcripts=tests/data/transcripts.tsv
```
- and query NOMEN for gene 'EGFR'
```
Expand All @@ -148,7 +164,7 @@ Prioritization algorithm uses profiles to flag variants (as passed or filtered),

- Prioritize variants from criteria on INFO annotations for profiles 'default' and 'GERMLINE' (see 'prioritization_profiles.json'), and query variants on prioritization tags
```
howard prioritization --input=tests/data/example.vcf.gz --output=/tmp/example.prioritized.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles=default,GERMLINE --pzfields=PZFlag,PZScore,PZComment
howard prioritization --input=tests/data/example.vcf.gz --output=/tmp/example.prioritized.vcf.gz --prioritizations=config/prioritization_profiles.json --profiles='default,GERMLINE' --pzfields='PZFlag,PZScore,PZComment'
```
- and query variants passing filters
```
Expand Down Expand Up @@ -251,7 +267,6 @@ howard process --config=config/config.json --param=config/param.json --input=tes
```



## Docker HOWARD-CLI

VCF annotation (Parquet, BCFTOOLS, ANNOVAR and snpEff) using HOWARD-CLI (snpEff and ANNOVAR databases will be automatically downloaded), and query list of genes with HGVS
Expand Down
2 changes: 1 addition & 1 deletion howard/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import sys

from howard.objects.variants import Variants
from howard.objects.annotation import Annotation
from howard.objects.database import Database
from howard.commons import *

from howard.tools.tools import *
Expand Down
2 changes: 1 addition & 1 deletion howard/objects/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__all__ = [
"variants",
"annotation"
"database"
]
6 changes: 0 additions & 6 deletions howard/objects/annotation.py

This file was deleted.

Loading

0 comments on commit 78aba65

Please sign in to comment.