update docs

nf-core · Dec 13, 2024 · 18b6784 · 18b6784
1 parent 28064b5
commit 18b6784
Show file tree

Hide file tree

Showing 6 changed files with 103 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
 
 The pipeline allows you to choose between running either [Bismark](https://github.com/FelixKrueger/Bismark) or [bwa-meth](https://github.com/brentp/bwa-meth) / [MethylDackel](https://github.com/dpryan79/methyldackel).
 
-Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat` or `--aligner bwameth`.
+Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat` or `--aligner bwameth`. For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.
 
 | Step                                         | Bismark workflow         | bwa-meth workflow     |
 | -------------------------------------------- | ------------------------ | --------------------- |
@@ -44,8 +44,8 @@ Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for
 | Extract methylation calls                    | Bismark                  | MethylDackel          |
 | Sample report                                | Bismark                  | -                     |
 | Summary Report                               | Bismark                  | -                     |
-| Alignment QC                                 | Qualimap                 | Qualimap              |
-| Sample complexity                            | Preseq                   | Preseq                |
+| Alignment QC                                 | Qualimap _(optional)_    | Qualimap _(optional)_ |
+| Sample complexity                            | Preseq _(optional)_      | Preseq _(optional)_   |
 | Project Report                               | MultiQC                  | MultiQC               |
 
 ## Usage
@@ -65,9 +65,9 @@ SRR389222_sub3,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/S
 Ecoli_10K_methylated,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R2.fastq.gz,
 ```
 
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
+> Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
 
-Now, you can run the pipeline using:
+Now, you can run the pipeline using default parameters as:
 
 ```bash
 nextflow run nf-core/methylseq --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
@@ -81,8 +81,7 @@ For more details and further functionality, please refer to the [usage documenta
 ## Pipeline output
 
 To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/methylseq/results) tab on the nf-core website pipeline page.
-For more details about the output files and reports, please refer to the
-[output documentation](https://nf-co.re/methylseq/output).
+For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/methylseq/output).
 
 ## Credits
 

diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png
diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png
diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png
diff --git a/docs/output.md b/docs/output.md
@@ -25,6 +25,64 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
+### Output Directories
+
+#### Bismark
+
+```
+bismark/
+├── bismark
+│   ├── alignments
+│   ├── deduplicated
+│   ├── methylation_calls
+│   ├── reports
+│   └── summary
+├── fastqc
+│   ├── Ecoli_10K_methylated_1_fastqc.html
+│   ├── Ecoli_10K_methylated_2_fastqc.html
+│   └── zips
+├── multiqc
+│   └── bismark
+├── pipeline_info
+│   ├── execution_report_2024-12-13_05-38-05.html
+│   ├── execution_timeline_2024-12-13_05-38-05.html
+│   ├── execution_trace_2024-12-13_05-38-05.txt
+│   ├── nf_core_pipeline_software_mqc_versions.yml
+│   ├── params_2024-12-13_05-38-14.json
+│   └── pipeline_dag_2024-12-13_05-38-05.html
+└── trimgalore
+    ├── fastqc
+    └── logs
+```
+
+#### bwa-meth
+
+```
+bwameth/
+├── bwameth
+│   ├── alignments
+│   └── deduplicated
+├── fastqc
+│   ├── Ecoli_10K_methylated_1_fastqc.html
+│   ├── Ecoli_10K_methylated_2_fastqc.html
+│   └── zips
+├── methyldackel
+│   ├── Ecoli_10K_methylated.markdup.sorted_CpG.bedGraph
+│   └── mbias
+├── multiqc
+│   └── bwameth
+├── pipeline_info
+│   ├── execution_report_2024-12-13_05-36-34.html
+│   ├── execution_timeline_2024-12-13_05-36-34.html
+│   ├── execution_trace_2024-12-13_05-36-34.txt
+│   ├── nf_core_pipeline_software_mqc_versions.yml
+│   ├── params_2024-12-13_05-36-43.json
+│   └── pipeline_dag_2024-12-13_05-36-34.html
+└── trimgalore
+    ├── fastqc
+    └── logs
+```
+
 ### FastQC
 
 <details markdown="1">
@@ -56,7 +114,7 @@ The nf-core/methylseq pipeline uses [TrimGalore!](http://www.bioinformatics.babr
 
 MultiQC reports the percentage of bases removed by Cutadapt in the _General Statistics_ table, along with a line plot showing where reads were trimmed.
 
-**Output directory: `results/trim_galore`**
+**Output directory: `results/trimgalore`**
 
 Contains FastQ files with quality and adapter trimmed reads for each sample, along with a log file describing the trimming.
 
@@ -65,7 +123,7 @@ Contains FastQ files with quality and adapter trimmed reads for each sample, alo
   - **NB:** Only saved if `--save_trimmed` has been specified.
 - `logs/sample_val_1.fq.gz_trimming_report.txt`
   - Trimming report (describes which parameters that were used)
-- `FastQC/sample_val_1_fastqc.zip`
+- `fastQC/sample_val_1_fastqc.zip`
   - FastQC report for trimmed reads
 
 Single-end data will have slightly different file names and only one FastQ file per sample.
@@ -74,7 +132,7 @@ Single-end data will have slightly different file names and only one FastQ file
 
 Bismark and bwa-meth convert all Cytosines contained within the sequenced reads to Thymine _in-silico_ and then align against a three-letter reference genome. This method avoids methylation-specific alignment bias. The alignment produces a BAM file of genomic alignments.
 
-**Bismark output directory: `results/bismark_alignments/`**
+**Bismark output directory: `results/bismark/alignments/`**
 _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment tool and the output file names will not differ between the options._
 
 - `sample.bam`
@@ -86,7 +144,7 @@ _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment t
   - Unmapped reads in FastQ format.
   - Only saved if `--unmapped` specified when running the pipeline.
 
-**bwa-meth output directory: `results/bwa-mem_alignments/`**
+**bwa-meth output directory: `results/bwameth/alignments/`**
 
 - `sample.bam`
   - Aligned reads in BAM format.
@@ -97,23 +155,23 @@ _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment t
 - `sample.sorted.bam.bai`
   - Index of sorted BAM file
   - **NB:** Only saved if `--save_align_intermeds`, `--skip_deduplication` or `--rrbs` is specified when running the pipeline.
-- `logs/sample_flagstat.txt`
+- `logs/samtools_stats/sample_flagstat.txt`
   - Summary file describing the number of reads which aligned in different ways.
-- `logs/sample_stats.txt`
+- `logs/samtools_stats/sample_stats.txt`
   - Summary file giving lots of metrics about the aligned BAM file.
 
 ### Deduplication
 
 This step removes alignments with identical mapping position to avoid technical duplication in the results. Note that it is skipped if `--save_align_intermeds`, `--skip_deduplication` or `--rrbs` is specified when running the pipeline.
 
-**Bismark output directory: `results/bismark_deduplicated/`**
+**Bismark output directory: `results/bismark/deduplicated/`**
 
 - `deduplicated.bam`
   - BAM file with only unique alignments.
 - `logs/deduplication_report.txt`
   - Log file giving summary statistics about deduplication.
 
-**bwa-meth output directory: `results/bwa-mem_markDuplicates/`**
+**bwa-meth output directory: `results/bwameth/deduplicated/`**
 
 > **NB:** The bwa-meth step doesn't remove duplicate reads from the BAM file, it just labels them.
 
@@ -137,7 +195,7 @@ Filename abbreviations stand for the following reference alignment strands:
 - `CTOT` - complementary to original top strand
 - `CTOB` - complementary to original bottom strand
 
-**Bismark output directory: `results/bismark_methylation_calls/`**
+**Bismark output directory: `results/bismark/methylation_calls/`**
 
 > **NB:** `CTOT` and `CTOB` are not aligned unless `--non_directional` specified.
 
@@ -152,7 +210,7 @@ Filename abbreviations stand for the following reference alignment strands:
 - `logs/sample_splitting_report.txt`
   - Log file giving summary statistics about methylation extraction.
 
-**bwa-meth workflow output directory: `results/MethylDackel/`**
+**bwa-meth workflow output directory: `results/methyldackel/`**
 
 - `sample.bedGraph`
   - Methylation statuses in [bedGraph](http://genome.ucsc.edu/goldenPath/help/bedgraph.html) format.

diff --git a/docs/usage.md b/docs/usage.md
@@ -28,9 +28,11 @@ Starting with Bismark `v0.21.0`, the pipeline also supports [HISAT2](https://ccb
 
 The second workflow uses [BWA-Meth](https://github.com/brentp/bwa-meth) as the alignment tool and [MethylDackel](https://github.com/dpryan79/methyldackel) for post-processing.
 
-Aligner Options
-• Standard BWA-Meth (CPU-based): This option can be invoked via `--aligner bwameth` and uses the traditional BWA-Meth aligner and runs on CPU processors.
-• Parabricks/FQ2BAMMETH (GPU-based): For higher performance, the pipeline can leverage the Parabricks implementation of BWA-Meth (fq2bammeth), which utilizes GPU processors. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.
+bwa-meth aligner options:
+
+- Standard `bwa-meth` (CPU-based): This option can be invoked via `--aligner bwameth` and uses the traditional BWA-Meth aligner and runs on CPU processors.
+
+- `Parabricks/FQ2BAMMETH` (GPU-based): For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.
 
 ## Samplesheet input
 
@@ -130,6 +132,28 @@ genome: 'GRCh37'
 
 You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
 
+### Providing `ext.args` to Tools
+
+Additional arguments can be appended to a command in a module by specifying them within the module’s custom configuration. The configurations for modules and subworkflows used in the pipeline can be found in `conf/modules` or `conf/subworkflows`. A module’s publishDir path can also be customized in these configurations.
+
+For example, users working with unfinished genomes containing tens or even hundreds of thousands of scaffolds, contigs, or chromosomes often encounter errors when pre-sorting reads into individual chromosome files. These errors are typically caused by the operating system’s limit on the number of file handles that can be open simultaneously (usually 1024; to find out this limit on Linux, use the command: ulimit -a).
+
+To bypass this limitation, the `--scaffolds` option can be added as an additional `ext.args` in `conf/modules/bismark_methylationextractor.config`. This prevents methylation calls from being pre-sorted into individual chromosome files. Instead, all input files are temporarily merged into a single file (unless there is only one file), which is then sorted by both chromosome and position using the Unix sort command.
+
+> For a detailed list of different options available, please refer to the official [Bismark](https://felixkrueger.github.io/Bismark/options/genome_preparation/) and [bwa-meth](https://github.com/brentp/bwa-meth) documentation.
+
+### Running the `test` profile
+
+Every nf-core pipeline comes with test data than can be run using `-profile test`. This test profile is useful for testing whether a user's environment is properly setup.
+
+```bash
+nextflow run nf-core/methylseq \
+  --input samplesheet.csv \
+  --outdir <OUTDIR> \
+  --genome GRCh38 \
+  -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
+```
+
 ### Updating the pipeline
 
 When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
@@ -299,7 +323,9 @@ The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementatio
 If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon every time a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`.
 
 1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19)
+
 2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags)
+
 3. Create the custom config accordingly:
 
 - For Docker: