-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Use Dorado instead of Guppy (with option to use it without local installation). - Include software versions and basecalling model in MultiQC report.
- Loading branch information
1 parent
9467458
commit f3706ab
Showing
13 changed files
with
410 additions
and
170 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,48 @@ | ||
# ONT Basecalling / Demux Pipeline | ||
|
||
Small pipeline to perform basecalling and demultiplexing (optional) of ONT data, collect QC metrics and generate a MultiQC report. | ||
It uses Guppy for basecalling and demultiplexing. | ||
It uses Dorado for basecalling and demultiplexing. | ||
|
||
## Requirements | ||
|
||
- [Nextflow](https://www.nextflow.io/) (>= 22.04) | ||
- [Apptainer](https://apptainer.org/) / Singularity | ||
- Guppy GPU (>= 6.4.6). Not distributed with the pipeline, hast to be downloaded from [ONT community](https://community.nanoporetech.com/) | ||
- Dorado (0.5.3 tested). It can be used via container, or installed locally from https://github.com/nanoporetech/dorado. | ||
|
||
## Usage | ||
|
||
- Clone this repository | ||
- If you want to demultiplex: create a `samples.csv` file with at least the `barcode` and `sample` columns. The `barcode` column should contain the barcode used for demultiplexing (with the leading zero, e.g. `barcode01`), and the `sample` column should contain the sample name (this name with be used on the report and as name for FASTQ file). | ||
- Make a copy of `params.default.yml` and modify it according to your needs. Remember to point `sample_data` parameter to the file created at the previous step. | ||
- **If you want to demultiplex:** create a `samples.csv` file with at least the `barcode` and `sample` columns. The `barcode` column should contain the barcode used for demultiplexing (with the leading zero, e.g. `barcode01`), and the `sample` column should contain the sample name (this name with be used on the report and as name for FASTQ file). | ||
- Copy `params.example.yml` (for example to `./my_params.yml`) and modify it according to your needs. Remember to point `sample_data` parameter to the file created at the previous step. | ||
- Run the pipeline passing your params file to `-params-file` option: | ||
|
||
``` | ||
nextflow run ont-basecalling-demultiplexing/ -params-file my_params.yml | ||
``` | ||
|
||
## Parameters | ||
|
||
| Parameter | Required | Default | Description | | ||
| ---------------------------------------- | -------- | ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | | ||
| `experiment_name` | False | - | Name of the experiment, used for final reports (title and filename). | | ||
| `data_dir` | True | - | Path to the folder containing the POD5 files. | | ||
| `sample_data` | True | `input/samples.csv` | Path to the CSV file containing the sample data (required if demultiplexing). | | ||
| `output_dir` | False | `demultiplex_results` | Path to the folder where the results will be saved. | | ||
| `fastq_output` | False | `true` | If `true`, the pipeline will generate FASTQ files (if not, it would be UBAM files). | | ||
| `qscore_filter` | False | `10` | Minimum QScore for the "pass" data, used for demultiplexing. | | ||
| `dorado_basecalling_model` | False | `[email protected]` | Model used for basecalling. | | ||
| `dorado_basecalling_extra_config` | False | - | Extra configuration for Dorado basecalling. | | ||
| `dorado_basecalling_gpus` | False | `1` | Number of GPUs to use for basecalling. | | ||
| `skip_demultiplexingskip_demultiplexing` | False | `false` | If `true`, the pipeline will not perform demultiplexing | | ||
| `dorado_demux_kit` | False | `EXP-NBD196` | Kit used for demultiplexing. | | ||
| `dorado_demux_both_ends` | False | `false` | If `true`, the pipeline will demultiplex using barcodes from both sides (5' and 3'). | | ||
| `dorado_demux_extra_config` | False | - | Extra configuration for Dorado demultiplexing. | | ||
| `dorado_demux_cpus` | False | `16` | Number of CPUs to use for demultiplexing. | | ||
| `use_dorado_container` | False | `true` | If `true`, the pipeline will use Dorado via container (~3.5GB download). If `false`, it will expect to find it locally. | | ||
|
||
## Considerations | ||
- The pipeline is designed to run on a SLURM cluster, but should run on local machines as well. | ||
|
||
- It is possible to run the pipeline either in SLURM clusters using `--profile slurm`. | ||
- Basecalling and demultiplexing are performed on separated steps to allow for a better control of the resources used by each process, and to prevent a whole basecalling redo in case of a failure during demultiplexing, wrong kit specified, etc. | ||
- The basecalling process uses GPU, so make sure to have one available. The SLURM job will be submitted with `--gres=gpu:X` option (with `X` as 1 by default). | ||
- Demultiplexing doesn't use GPU. | ||
- The basecalling process uses GPU, so make sure to have one available. If using SLURM, the job will be submitted with `--gres=gpu:X` option. | ||
- Demultiplexing step won't use GPU, only CPU. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
// containers | ||
process { | ||
withLabel: linux { container = 'ubuntu:22.04' } | ||
withLabel: fastqc { container = 'quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0' } | ||
withLabel: nanoplot { container = 'quay.io/biocontainers/nanoplot:1.42.0--pyhdfd78af_0' } | ||
withLabel: multiqc { container = 'quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0' } | ||
withLabel: pigz { container = 'ghcr.io/dialvarezs/containers/utils:latest' } | ||
withLabel: pycoqc { container = 'quay.io/biocontainers/pycoqc:2.5.2--py_0' } | ||
withLabel: samtools { container = 'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_0' } | ||
|
||
withLabel: dorado { | ||
container = params.use_dorado_container | ||
? 'ghcr.io/dialvarezs/containers/dorado:0.5.3' | ||
: null | ||
containerOptions = '--nv' | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
params { | ||
experiment_name = '' | ||
data_dir = null | ||
sample_data = 'input/samples.csv' | ||
output_dir = 'demultiplex_results/' | ||
fastq_output = true | ||
qscore_filter = 10 | ||
dorado_basecalling_model = '[email protected]' | ||
dorado_basecalling_extra_config = '' | ||
dorado_basecalling_gpus = 1 | ||
skip_demultiplexing = false | ||
dorado_demux_kit = 'EXP-NBD196' | ||
dorado_demux_both_ends = false | ||
dorado_demux_extra_config = '' | ||
dorado_demux_cpus = 16 | ||
use_dorado_container = true | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
profiles { | ||
apptainer { | ||
apptainer { | ||
enabled = true | ||
autoMounts = true | ||
} | ||
} | ||
slurm { | ||
process { | ||
executor = 'slurm' | ||
module = 'apptainer' | ||
|
||
withLabel: dorado { | ||
module = params.use_dorado_container ? null : 'dorado' | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
experiment_name: '' | ||
data_dir: input/pod5/ | ||
sample_data: input/samples.csv | ||
output_dir: demultiplex_results/ | ||
fastq_output: true | ||
qscore_filter: 10 | ||
dorado_basecalling_model: [email protected] | ||
dorado_basecalling_extra_config: '' | ||
dorado_basecalling_gpus: 1 | ||
skip_demultiplexing: false | ||
dorado_demux_kit: EXP-NBD196 | ||
dorado_demux_both_ends: false | ||
dorado_demux_extra_config: '' | ||
dorado_demux_cpus: 16 | ||
use_dorado_container: true |
Oops, something went wrong.