Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chopper and nanoq options for longread preprocessing #692

Merged
merged 18 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#692](https://github.com/nf-core/mag/pull/692) - Added Nanoq as optional longread filtering tool (added by @muabnezor)
- [#692](https://github.com/nf-core/mag/pull/692) - Added chopper as optional longread filtering tool and/or phage lambda removal tool (added by @muabnezor)
- [#708](https://github.com/nf-core/mag/pull/708) - Added `--exclude_unbins_from_postbinning` parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs)

### `Changed`
Expand All @@ -17,6 +19,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Dependencies`

| Tool | Previous version | New version |
| ------- | ---------------- | ----------- |
| chopper | | 0.9.0 |
| nanoq | | 0.10.0 |

### `Deprecated`

## 3.2.1 [2024-10-30]
Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@

> Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25(7), 1043–1055. doi: 10.1101/gr.186072.114

- [Chopper](https://doi.org/10.1093/bioinformatics/bty149)

> De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149

- [CONCOCT](https://doi.org/10.1038/nmeth.3103)

> Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., Lahti, L., Loman, N. J., Andersson, A. F., & Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11), 1144–1146. doi: 10.1038/nmeth.3103
Expand Down Expand Up @@ -114,6 +118,10 @@

> De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149.

- [Nanoq](https://doi.org/10.21105/joss.02991)

> Steinig, E., Coin, L. (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991, doi: 10.21105/joss.02991

- [Porechop](https://github.com/rrwick/Porechop)

- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI)
Expand Down
60 changes: 57 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -183,22 +183,76 @@ process {
"--min_length ${params.longreads_min_length}",
"--keep_percent ${params.longreads_keep_percent}",
"--trim",
"--length_weight ${params.longreads_length_weight}"
"--length_weight ${params.longreads_length_weight}",
params.longreads_min_quality ? "--min_mean_q ${params.longreads_min_quality}" : '',
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/QC_longreads/Filtlong" },
mode: params.publish_dir_mode,
pattern: "*_filtlong.fastq.gz",
enabled: params.save_filtlong_reads
enabled: params.save_filtered_longreads
]
ext.prefix = { "${meta.id}_run${meta.run}_filtlong" }
}

withName: NANOQ {
ext.args = [
"--min-len ${params.longreads_min_length}",
params.longreads_min_quality ? "--min-qual ${params.longreads_min_quality}": '',
"-vv"
].join(' ').trim()
publishDir = [
[
path: { "${params.outdir}/QC_longreads/Nanoq" },
mode: params.publish_dir_mode,
pattern: "*_nanoq_filtered.fastq.gz",
enabled: params.save_filtered_longreads
],
[
path: { "${params.outdir}/QC_longreads/Nanoq" },
mode: params.publish_dir_mode,
pattern: "*_nanoq_filtered.stats"
]
]
ext.prefix = { "${meta.id}_run${meta.run}_nanoq_filtered" }
}

withName: NANOLYSE {
publishDir = [[path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*.log"], [path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*_nanolyse.fastq.gz", enabled: params.save_lambdaremoved_reads]]
publishDir = [
[
path: { "${params.outdir}/QC_longreads/NanoLyse" },
mode: params.publish_dir_mode, pattern: "*.log"
],
[
path: { "${params.outdir}/QC_longreads/NanoLyse" },
mode: params.publish_dir_mode, pattern: "*_nanolyse.fastq.gz",
enabled: params.save_lambdaremoved_reads
]
]
ext.prefix = { "${meta.id}_run${meta.run}_lambdafiltered" }
}

withName: CHOPPER {
ext.args2 = [
params.longreads_min_quality ? "--quality ${params.longreads_min_quality}": '',
params.longreads_min_length ? "--minlength ${params.longreads_min_length}": ''
].join(' ').trim()
publishDir = [
[
path: { "${params.outdir}/QC_longreads/Chopper" },
mode: params.publish_dir_mode,
pattern: "*.log"
],
[
path: { "${params.outdir}/QC_longreads/Chopper" },
mode: params.publish_dir_mode,
pattern: "*_chopper.fastq.gz",
enabled: params.save_lambdaremoved_reads || params.save_filtered_longreads
]
]
ext.prefix = { "${meta.id}_run${meta.run}_chopper" }
}

withName: NANOPLOT_RAW {
ext.prefix = 'raw'
ext.args = {
Expand Down
24 changes: 19 additions & 5 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,25 +109,39 @@ The pipeline uses Nanolyse to map the reads against the Lambda phage and removes

</details>

### Filtlong and porechop
### Long read adapter removal

The pipeline uses filtlong and porechop to perform quality control of the long reads that are eventually provided with the TSV input file.
The pipeline uses porecho_abi or porechop to perform adaptertrimming of the long reads that are eventually provided with the TSV input file.

<details markdown="1">
<summary>Output files</summary>

- `QC_longreads/porechop/`
- `[sample]_[run]_porechop_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop'`, the adapter trimmed FASTQ files from porechop
- `[sample]_[run]_porechop-abi_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop_abi'`, the adapter trimmed FASTQ files from porechop_ABI
- `QC_longreads/filtlong/`

</details>

### Long read filtering

The pipeline uses filtlong, chopper, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <filtlong|chopper|nanoq>`. Only filtlong is capable of filtering long reads against short reads, and is therefore currently recommended in the hybrid mode. If chopper is selected as long read filtering tool, Lambda Phage removal will be performed with chopper as well, instead of nanolyse.

<details markdown="1">
<summary>Output files</summary>

- `QC_longreads/Filtlong/`
- `[sample]_[run]_filtlong.fastq.gz`: The length and quality filtered reads in FASTQ from Filtlong
- `QC_longreads/Nanoq/`
- `[sample]_[run]_nanoq_filtered.fastq.gz`: The length and quality filtered reads in FASTQ from Nanoq
- `QC_longreads/Chopper/`
- `[sample]_[run]_nanoq_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper

</details>

Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtlong_reads` (respectively) are provided to the run command .
Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtered_longreads` (respectively) are provided to the run command .

No direct host read removal is performed for long reads.
However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded.
However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded. Note that this only applies when using filtlong as long read filtering tool.
The lower the parameter `--longreads_length_weight`, the higher the impact of the read qualities for filtering.
For further documentation see the [filtlong online documentation](https://github.com/rrwick/Filtlong).

Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"chopper": {
"branch": "master",
"git_sha": "22737835af2db3dd0d5b6b332e75e160d0199fae",
"installed_by": ["modules"]
},
"concoct/concoct": {
"branch": "master",
"git_sha": "baa30accc6c50ea8a98662417d4f42ed18966353",
Expand Down Expand Up @@ -212,6 +217,11 @@
"git_sha": "3135090b46f308a260fc9d5991d7d2f9c0785309",
"installed_by": ["modules"]
},
"nanoq": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"porechop/abi": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/chopper/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

56 changes: 56 additions & 0 deletions modules/nf-core/chopper/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions modules/nf-core/chopper/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading