Skip to content

Commit

Permalink
Add CADD annotation (#266)
Browse files Browse the repository at this point in the history
* Add CADD annotation

* remove cadd patch

* Update subworkflows/local/snv_annotation/main.nf

Co-authored-by: Anders Jemt <[email protected]>

* Update subworkflows/local/annotate_cadd/main.nf

Co-authored-by: Anders Jemt <[email protected]>

---------

Co-authored-by: Anders Jemt <[email protected]>
  • Loading branch information
fellen31 and jemten authored Aug 9, 2024
1 parent 1b1c8aa commit 687743e
Show file tree
Hide file tree
Showing 38 changed files with 1,604 additions and 50 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#252](https://github.com/genomic-medicine-sweden/nallo/pull/252) - Added a new `SCATTER_GENOME` subworkflow
- [#255](https://github.com/genomic-medicine-sweden/nallo/pull/255) - Added a new `RANK_VARIANTS` subworkflow to rank SNVs using genmod
- [#261](https://github.com/genomic-medicine-sweden/nallo/pull/261) - Added a `--skip_rank_variants` parameter to skip the rank_variants subworkflow
- [#266](https://github.com/genomic-medicine-sweden/nallo/pull/266) - Added CADD to dynamically calculate indel CADD-scores
- [#270](https://github.com/genomic-medicine-sweden/nallo/pull/270) - Added SNV phasing stats to MultiQC
- [#271](https://github.com/genomic-medicine-sweden/nallo/pull/271) - Added a `--skip_aligned_read_qc` parameter to skip the qc aligned reads subworkflow

Expand Down Expand Up @@ -74,6 +75,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| `--skip_repeat_wf` | `--skip_repeat_annotation` |
| | `--skip_rank_variants` |
| | `--skip_aligned_read_qc` |
| | `--cadd_resources` |
| | `--cadd_prescored` |

> [!NOTE]
> Parameter has been updated if both old and new parameter information is present.
Expand All @@ -86,6 +89,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| ----------- | ----------- | ----------- |
| deepvariant | 1.5.0 | 1.6.1 |
| htslib | 1.19.1 | 1.20 |
| cadd | | 1.6.post1 |
| gawk | | 5.3.0 |

## v0.2.0 - [2024-06-26]

Expand Down
1 change: 1 addition & 0 deletions assets/cadd_to_vcf_header_-1.0-.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
##INFO=<ID=CADD,Number=1,Type=Float,Description="PHRED-like scaled CADD score.">
66 changes: 66 additions & 0 deletions conf/modules/annotate_cadd.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

//
// CADD annotation
//

process {

withName: '.*:ANNOTATE_CADD:.*' {
publishDir = [
enabled: false
]
}

withName: '.*:ANNOTATE_CADD:BCFTOOLS_VIEW' {
ext.args = [
'--output-type z',
'--types indels,other'
].join(' ')
ext.prefix = { "${vcf.simpleName}_indels" }
}

withName: '.*:ANNOTATE_CADD:CADD' {
ext.prefix = { "${vcf.simpleName}_cadd" }
}

withName: '.*:ANNOTATE_CADD:TABIX_CADD' {
ext.args = { "--force --sequence 1 --begin 2 --end 2" }
}

withName: '.*:ANNOTATE_CADD:ANNOTATE_INDELS' {
ext.args = [
'--columns Chrom,Pos,Ref,Alt,-,CADD',
'--output-type z',
'--write-index=tbi'
].join(' ')
ext.prefix = { "${input.simpleName}_ann" }
}

withName: '.*:ANNOTATE_CADD:REFERENCE_TO_CADD_CHRNAMES' {
ext.args2 = '\'{original=$1; sub("chr","",$1); print original, $1}\''
ext.prefix = "reference_to_cadd"
ext.suffix = "txt"
}

withName: '.*:ANNOTATE_CADD:CADD_TO_REFERENCE_CHRNAMES' {
ext.args2 = '\'{original=$1; sub("chr","",$1); print $1, original}\''
ext.prefix = "cadd_to_reference"
ext.suffix = "txt"
}

withName: '.*:ANNOTATE_CADD:RENAME_CHRNAMES' {
ext.args = '--output-type z'
}
}
8 changes: 8 additions & 0 deletions conf/modules/snv_annotation.config
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ process {
ext.prefix = { "${meta.id}_echtvar_anno" }
}

withName: 'BCFTOOLS_FILLTAGS_ANNO' {
ext.prefix = { "${meta.id}_filltags_anno" }
ext.args = [
'--output-type z',
'--write-index=tbi'
].join(' ')
}

withName: '.*:SNV_ANNOTATION:ENSEMBLVEP_VEP' {
ext.prefix = { "${meta.id}_vep" }
ext.args = { [
Expand Down
6 changes: 4 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,15 +114,17 @@ Some workflows require additional files:
- If running without `--skip_repeat_annotation`, download a json variant catalog, (e.g. [variant_catalog_grch38.json](https://github.com/Clinical-Genomics/stranger/raw/main/stranger/resources/variant_catalog_grch38.json)) matching your reference genome to supply with `--variant_catalog`.

- If running without `--skip_snv_annotation`, download [VEP cache](https://ftp.ensembl.org/pub/release-110/variation/vep/homo_sapiens_vep_110_GRCh38.tar.gz) to supply with `--vep_cache` and prepare a samplesheet with annotation databases ([`echtvar encode`](https://github.com/brentp/echtvar)) to supply with `--snp_db`:
- If your samplesheet contains at least one affected sample (phenotype = 2), `--reduced_penetrance` (Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv)), `--score_config_snv` (Used by GENMOD for ranking the variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rank_model_snv.ini)) and `--variant_consequences_snv` (File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html)) is also required.
`snp_dbs.csv`

```
sample,file
gnomad,/path/to/gnomad.v3.1.2.echtvar.popmax.v2.zip
cadd,/path/to/cadd.v1.6.hg38.zip
```

- If your samplesheet contains at least one affected sample (phenotype = 2), `--reduced_penetrance` (Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv)), `--score_config_snv` (Used by GENMOD for ranking the variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rank_model_snv.ini)) and `--variant_consequences_snv` (File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html)) is also required.

- Optionally, if running without `--skip_snv_annotation`, supply a path to a folder containing cadd annotations with `--cadd_resources` and prescored indels with `--cadd_prescored`. Equivalent of the data/annotations/ and data/prescored/ folders described [here](https://github.com/kircherlab/CADD-scripts/#manual-installation), and it is used to calculate CADD scores for small indels.

- If running without `--skip_cnv_calling`, expected CN regions for your reference genome can be downloaded from [HiFiCNV GitHub](https://github.com/PacificBiosciences/HiFiCNV/tree/main/data) to supply with `--hificnv_xy`, `--hificnv_xx` (expected_cn) and `--hificnv_exclude` (excluded_regions).

- If you want to include extra samples for mili-sample calling of SVs - prepare a samplesheet with .snf files from Sniffles to supply with `--extra_snfs`:
Expand Down
17 changes: 17 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bcftools/annotate": {
"branch": "master",
"git_sha": "c1a7fa1c038061b344f2a41be71942061ec40d2e",
"installed_by": ["modules"],
"patch": "modules/nf-core/bcftools/annotate/bcftools-annotate.diff"
},
"bcftools/concat": {
"branch": "master",
"git_sha": "33ef773a7ea36e88323902f63662aa53c9b88988",
Expand Down Expand Up @@ -62,6 +68,12 @@
"git_sha": "571a5feac4c9ce0a8df0bc15b94230e7f3e8db47",
"installed_by": ["modules"]
},
"cadd": {
"branch": "master",
"git_sha": "cf3ed075695639b0a0924eb0901146df1996dc08",
"installed_by": ["modules"],
"patch": "modules/nf-core/cadd/cadd.diff"
},
"cat/fastq": {
"branch": "master",
"git_sha": "4fc983ad0b30e6e32696fa7d980c76c7bfe1c03e",
Expand Down Expand Up @@ -89,6 +101,11 @@
"installed_by": ["modules"],
"patch": "modules/nf-core/fastqc/fastqc.diff"
},
"gawk": {
"branch": "master",
"git_sha": "cf3ed075695639b0a0924eb0901146df1996dc08",
"installed_by": ["modules"]
},
"genmod/annotate": {
"branch": "master",
"git_sha": "1aba459a6f3528bee806403ae47bea304de26603",
Expand Down
2 changes: 2 additions & 0 deletions modules/local/bcftools/filltags/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ process BCFTOOLS_FILLTAGS {

output:
tuple val(meta), path("*.{vcf,vcf.gz,bcf,bcf.gz}"), emit: vcf
tuple val(meta), path("*.csi") , emit: csi, optional: true
tuple val(meta), path("*.tbi") , emit: tbi, optional: true
path "versions.yml" , emit: versions

when:
Expand Down
47 changes: 47 additions & 0 deletions modules/nf-core/bcftools/annotate/bcftools-annotate.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions modules/nf-core/bcftools/annotate/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 81 additions & 0 deletions modules/nf-core/bcftools/annotate/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

67 changes: 67 additions & 0 deletions modules/nf-core/bcftools/annotate/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions modules/nf-core/bcftools/annotate/tests/bcf.config

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 687743e

Please sign in to comment.