Skip to content

Commit

Permalink
Version 2.0 (#1)
Browse files Browse the repository at this point in the history
- Use Dorado instead of Guppy (with option to use it without local installation).
- Include software versions and basecalling model in MultiQC report.
  • Loading branch information
dialvarezs authored Feb 26, 2024
1 parent 9467458 commit f3706ab
Show file tree
Hide file tree
Showing 13 changed files with 410 additions and 170 deletions.
38 changes: 31 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,48 @@
# ONT Basecalling / Demux Pipeline

Small pipeline to perform basecalling and demultiplexing (optional) of ONT data, collect QC metrics and generate a MultiQC report.
It uses Guppy for basecalling and demultiplexing.
It uses Dorado for basecalling and demultiplexing.

## Requirements

- [Nextflow](https://www.nextflow.io/) (>= 22.04)
- [Apptainer](https://apptainer.org/) / Singularity
- Guppy GPU (>= 6.4.6). Not distributed with the pipeline, hast to be downloaded from [ONT community](https://community.nanoporetech.com/)
- Dorado (0.5.3 tested). It can be used via container, or installed locally from https://github.com/nanoporetech/dorado.

## Usage

- Clone this repository
- If you want to demultiplex: create a `samples.csv` file with at least the `barcode` and `sample` columns. The `barcode` column should contain the barcode used for demultiplexing (with the leading zero, e.g. `barcode01`), and the `sample` column should contain the sample name (this name with be used on the report and as name for FASTQ file).
- Make a copy of `params.default.yml` and modify it according to your needs. Remember to point `sample_data` parameter to the file created at the previous step.
- **If you want to demultiplex:** create a `samples.csv` file with at least the `barcode` and `sample` columns. The `barcode` column should contain the barcode used for demultiplexing (with the leading zero, e.g. `barcode01`), and the `sample` column should contain the sample name (this name with be used on the report and as name for FASTQ file).
- Copy `params.example.yml` (for example to `./my_params.yml`) and modify it according to your needs. Remember to point `sample_data` parameter to the file created at the previous step.
- Run the pipeline passing your params file to `-params-file` option:

```
nextflow run ont-basecalling-demultiplexing/ -params-file my_params.yml
```

## Parameters

| Parameter | Required | Default | Description |
| ---------------------------------------- | -------- | ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- |
| `experiment_name` | False | - | Name of the experiment, used for final reports (title and filename). |
| `data_dir` | True | - | Path to the folder containing the POD5 files. |
| `sample_data` | True | `input/samples.csv` | Path to the CSV file containing the sample data (required if demultiplexing). |
| `output_dir` | False | `demultiplex_results` | Path to the folder where the results will be saved. |
| `fastq_output` | False | `true` | If `true`, the pipeline will generate FASTQ files (if not, it would be UBAM files). |
| `qscore_filter` | False | `10` | Minimum QScore for the "pass" data, used for demultiplexing. |
| `dorado_basecalling_model` | False | `[email protected]` | Model used for basecalling. |
| `dorado_basecalling_extra_config` | False | - | Extra configuration for Dorado basecalling. |
| `dorado_basecalling_gpus` | False | `1` | Number of GPUs to use for basecalling. |
| `skip_demultiplexingskip_demultiplexing` | False | `false` | If `true`, the pipeline will not perform demultiplexing |
| `dorado_demux_kit` | False | `EXP-NBD196` | Kit used for demultiplexing. |
| `dorado_demux_both_ends` | False | `false` | If `true`, the pipeline will demultiplex using barcodes from both sides (5' and 3'). |
| `dorado_demux_extra_config` | False | - | Extra configuration for Dorado demultiplexing. |
| `dorado_demux_cpus` | False | `16` | Number of CPUs to use for demultiplexing. |
| `use_dorado_container` | False | `true` | If `true`, the pipeline will use Dorado via container (~3.5GB download). If `false`, it will expect to find it locally. |

## Considerations
- The pipeline is designed to run on a SLURM cluster, but should run on local machines as well.

- It is possible to run the pipeline either in SLURM clusters using `--profile slurm`.
- Basecalling and demultiplexing are performed on separated steps to allow for a better control of the resources used by each process, and to prevent a whole basecalling redo in case of a failure during demultiplexing, wrong kit specified, etc.
- The basecalling process uses GPU, so make sure to have one available. The SLURM job will be submitted with `--gres=gpu:X` option (with `X` as 1 by default).
- Demultiplexing doesn't use GPU.
- The basecalling process uses GPU, so make sure to have one available. If using SLURM, the job will be submitted with `--gres=gpu:X` option.
- Demultiplexing step won't use GPU, only CPU.
17 changes: 17 additions & 0 deletions conf/containers.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// containers
process {
withLabel: linux { container = 'ubuntu:22.04' }
withLabel: fastqc { container = 'quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0' }
withLabel: nanoplot { container = 'quay.io/biocontainers/nanoplot:1.42.0--pyhdfd78af_0' }
withLabel: multiqc { container = 'quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0' }
withLabel: pigz { container = 'ghcr.io/dialvarezs/containers/utils:latest' }
withLabel: pycoqc { container = 'quay.io/biocontainers/pycoqc:2.5.2--py_0' }
withLabel: samtools { container = 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_0' }

withLabel: dorado {
container = params.use_dorado_container
? 'ghcr.io/dialvarezs/containers/dorado:0.5.3'
: null
containerOptions = '--nv'
}
}
17 changes: 17 additions & 0 deletions conf/params.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
params {
experiment_name = ''
data_dir = null
sample_data = 'input/samples.csv'
output_dir = 'demultiplex_results/'
fastq_output = true
qscore_filter = 10
dorado_basecalling_model = '[email protected]'
dorado_basecalling_extra_config = ''
dorado_basecalling_gpus = 1
skip_demultiplexing = false
dorado_demux_kit = 'EXP-NBD196'
dorado_demux_both_ends = false
dorado_demux_extra_config = ''
dorado_demux_cpus = 16
use_dorado_container = true
}
18 changes: 18 additions & 0 deletions conf/profiles.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
profiles {
apptainer {
apptainer {
enabled = true
autoMounts = true
}
}
slurm {
process {
executor = 'slurm'
module = 'apptainer'

withLabel: dorado {
module = params.use_dorado_container ? null : 'dorado'
}
}
}
}
24 changes: 15 additions & 9 deletions main.nf
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
#!/usr/bin/env nextflow
include { addDefaultParamValues; pathCheck } from './lib/groovy/utils.gvy'

// load default parameters from YAML
addDefaultParamValues(params, "${workflow.projectDir}/params.default.yml")


include { BasecallingAndDemux } from './subworkflows/basecalling_demux.nf'
include { QualityCheck } from './subworkflows/quality_check.nf'
include { GenerateReports } from './subworkflows/reports.nf'
include { CollectVersions } from './subworkflows/versions.nf'

include { pathCheck } from './lib/groovy/utils.gvy'


// check and prepare input channels
data_dir = pathCheck(params.data_dir, isDirectory = true)
multiqc_config = pathCheck("${workflow.projectDir}/conf/multiqc_config.yaml")
multiqc_config = pathCheck("${workflow.projectDir}/tool_conf/multiqc_config.yaml")

if (params.skip_demultiplexing) {
sample_names = channel.fromList([])
Expand All @@ -25,10 +23,18 @@ if (params.skip_demultiplexing) {

workflow {
BasecallingAndDemux(sample_names, data_dir)

QualityCheck(
BasecallingAndDemux.out.sequences,
BasecallingAndDemux.out.sequencing_summary,
BasecallingAndDemux.out.barcoding_summary,
BasecallingAndDemux.out.sequencing_summary
)

CollectVersions()

GenerateReports(
QualityCheck.out.software_reports,
CollectVersions.out.software_versions,
CollectVersions.out.model_versions,
multiqc_config
)
}
24 changes: 3 additions & 21 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,6 @@ process {
errorStrategy = 'finish'
}

singularity {
enabled = true
autoMounts = true
}

process {
executor = 'slurm'
module = 'apptainer'

withLabel: guppy { module = 'guppy' }
}

// containers
process {
withLabel: linux { container = 'ubuntu:22.04' }
withLabel: pigz { container = 'ghcr.io/dialvarezs/containers/pigz:2.7' }
withLabel: fastqc { container = 'quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0' }
withLabel: nanoplot { container = 'quay.io/biocontainers/nanoplot:1.41.3--pyhdfd78af_0' }
withLabel: multiqc { container = 'quay.io/biocontainers/multiqc:1.14--pyhdfd78af_0' }
withLabel: pycoqc { container = 'quay.io/biocontainers/pycoqc:2.5.2--py_0' }
}
includeConfig 'conf/params.config'
includeConfig 'conf/profiles.config'
includeConfig 'conf/containers.config'
13 changes: 0 additions & 13 deletions params.default.yml

This file was deleted.

15 changes: 15 additions & 0 deletions params.example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
experiment_name: ''
data_dir: input/pod5/
sample_data: input/samples.csv
output_dir: demultiplex_results/
fastq_output: true
qscore_filter: 10
dorado_basecalling_model: [email protected]
dorado_basecalling_extra_config: ''
dorado_basecalling_gpus: 1
skip_demultiplexing: false
dorado_demux_kit: EXP-NBD196
dorado_demux_both_ends: false
dorado_demux_extra_config: ''
dorado_demux_cpus: 16
use_dorado_container: true
Loading

0 comments on commit f3706ab

Please sign in to comment.