Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
sateeshperi committed Dec 13, 2024
1 parent 28064b5 commit 18b6784
Show file tree
Hide file tree
Showing 6 changed files with 103 additions and 20 deletions.
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ On release, automated continuous integration tests run the pipeline on a full-si

The pipeline allows you to choose between running either [Bismark](https://github.com/FelixKrueger/Bismark) or [bwa-meth](https://github.com/brentp/bwa-meth) / [MethylDackel](https://github.com/dpryan79/methyldackel).

Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat` or `--aligner bwameth`.
Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat` or `--aligner bwameth`. For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.

| Step | Bismark workflow | bwa-meth workflow |
| -------------------------------------------- | ------------------------ | --------------------- |
Expand All @@ -44,8 +44,8 @@ Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for
| Extract methylation calls | Bismark | MethylDackel |
| Sample report | Bismark | - |
| Summary Report | Bismark | - |
| Alignment QC | Qualimap | Qualimap |
| Sample complexity | Preseq | Preseq |
| Alignment QC | Qualimap _(optional)_ | Qualimap _(optional)_ |
| Sample complexity | Preseq _(optional)_ | Preseq _(optional)_ |
| Project Report | MultiQC | MultiQC |

## Usage
Expand All @@ -65,9 +65,9 @@ SRR389222_sub3,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/S
Ecoli_10K_methylated,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R2.fastq.gz,
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
> Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
Now, you can run the pipeline using default parameters as:

```bash
nextflow run nf-core/methylseq --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
Expand All @@ -81,8 +81,7 @@ For more details and further functionality, please refer to the [usage documenta
## Pipeline output

To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/methylseq/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/methylseq/output).
For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/methylseq/output).

## Credits

Expand Down
Binary file added docs/images/mqc_fastqc_adapter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mqc_fastqc_counts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mqc_fastqc_quality.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 68 additions & 10 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,64 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### Output Directories

#### Bismark

```
bismark/
├── bismark
│ ├── alignments
│ ├── deduplicated
│ ├── methylation_calls
│ ├── reports
│ └── summary
├── fastqc
│ ├── Ecoli_10K_methylated_1_fastqc.html
│ ├── Ecoli_10K_methylated_2_fastqc.html
│ └── zips
├── multiqc
│ └── bismark
├── pipeline_info
│ ├── execution_report_2024-12-13_05-38-05.html
│ ├── execution_timeline_2024-12-13_05-38-05.html
│ ├── execution_trace_2024-12-13_05-38-05.txt
│ ├── nf_core_pipeline_software_mqc_versions.yml
│ ├── params_2024-12-13_05-38-14.json
│ └── pipeline_dag_2024-12-13_05-38-05.html
└── trimgalore
├── fastqc
└── logs
```

#### bwa-meth

```
bwameth/
├── bwameth
│ ├── alignments
│ └── deduplicated
├── fastqc
│ ├── Ecoli_10K_methylated_1_fastqc.html
│ ├── Ecoli_10K_methylated_2_fastqc.html
│ └── zips
├── methyldackel
│ ├── Ecoli_10K_methylated.markdup.sorted_CpG.bedGraph
│ └── mbias
├── multiqc
│ └── bwameth
├── pipeline_info
│ ├── execution_report_2024-12-13_05-36-34.html
│ ├── execution_timeline_2024-12-13_05-36-34.html
│ ├── execution_trace_2024-12-13_05-36-34.txt
│ ├── nf_core_pipeline_software_mqc_versions.yml
│ ├── params_2024-12-13_05-36-43.json
│ └── pipeline_dag_2024-12-13_05-36-34.html
└── trimgalore
├── fastqc
└── logs
```

### FastQC

<details markdown="1">
Expand Down Expand Up @@ -56,7 +114,7 @@ The nf-core/methylseq pipeline uses [TrimGalore!](http://www.bioinformatics.babr

MultiQC reports the percentage of bases removed by Cutadapt in the _General Statistics_ table, along with a line plot showing where reads were trimmed.

**Output directory: `results/trim_galore`**
**Output directory: `results/trimgalore`**

Contains FastQ files with quality and adapter trimmed reads for each sample, along with a log file describing the trimming.

Expand All @@ -65,7 +123,7 @@ Contains FastQ files with quality and adapter trimmed reads for each sample, alo
- **NB:** Only saved if `--save_trimmed` has been specified.
- `logs/sample_val_1.fq.gz_trimming_report.txt`
- Trimming report (describes which parameters that were used)
- `FastQC/sample_val_1_fastqc.zip`
- `fastQC/sample_val_1_fastqc.zip`
- FastQC report for trimmed reads

Single-end data will have slightly different file names and only one FastQ file per sample.
Expand All @@ -74,7 +132,7 @@ Single-end data will have slightly different file names and only one FastQ file

Bismark and bwa-meth convert all Cytosines contained within the sequenced reads to Thymine _in-silico_ and then align against a three-letter reference genome. This method avoids methylation-specific alignment bias. The alignment produces a BAM file of genomic alignments.

**Bismark output directory: `results/bismark_alignments/`**
**Bismark output directory: `results/bismark/alignments/`**
_Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment tool and the output file names will not differ between the options._

- `sample.bam`
Expand All @@ -86,7 +144,7 @@ _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment t
- Unmapped reads in FastQ format.
- Only saved if `--unmapped` specified when running the pipeline.

**bwa-meth output directory: `results/bwa-mem_alignments/`**
**bwa-meth output directory: `results/bwameth/alignments/`**

- `sample.bam`
- Aligned reads in BAM format.
Expand All @@ -97,23 +155,23 @@ _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment t
- `sample.sorted.bam.bai`
- Index of sorted BAM file
- **NB:** Only saved if `--save_align_intermeds`, `--skip_deduplication` or `--rrbs` is specified when running the pipeline.
- `logs/sample_flagstat.txt`
- `logs/samtools_stats/sample_flagstat.txt`
- Summary file describing the number of reads which aligned in different ways.
- `logs/sample_stats.txt`
- `logs/samtools_stats/sample_stats.txt`
- Summary file giving lots of metrics about the aligned BAM file.

### Deduplication

This step removes alignments with identical mapping position to avoid technical duplication in the results. Note that it is skipped if `--save_align_intermeds`, `--skip_deduplication` or `--rrbs` is specified when running the pipeline.

**Bismark output directory: `results/bismark_deduplicated/`**
**Bismark output directory: `results/bismark/deduplicated/`**

- `deduplicated.bam`
- BAM file with only unique alignments.
- `logs/deduplication_report.txt`
- Log file giving summary statistics about deduplication.

**bwa-meth output directory: `results/bwa-mem_markDuplicates/`**
**bwa-meth output directory: `results/bwameth/deduplicated/`**

> **NB:** The bwa-meth step doesn't remove duplicate reads from the BAM file, it just labels them.
Expand All @@ -137,7 +195,7 @@ Filename abbreviations stand for the following reference alignment strands:
- `CTOT` - complementary to original top strand
- `CTOB` - complementary to original bottom strand

**Bismark output directory: `results/bismark_methylation_calls/`**
**Bismark output directory: `results/bismark/methylation_calls/`**

> **NB:** `CTOT` and `CTOB` are not aligned unless `--non_directional` specified.
Expand All @@ -152,7 +210,7 @@ Filename abbreviations stand for the following reference alignment strands:
- `logs/sample_splitting_report.txt`
- Log file giving summary statistics about methylation extraction.

**bwa-meth workflow output directory: `results/MethylDackel/`**
**bwa-meth workflow output directory: `results/methyldackel/`**

- `sample.bedGraph`
- Methylation statuses in [bedGraph](http://genome.ucsc.edu/goldenPath/help/bedgraph.html) format.
Expand Down
32 changes: 29 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,11 @@ Starting with Bismark `v0.21.0`, the pipeline also supports [HISAT2](https://ccb

The second workflow uses [BWA-Meth](https://github.com/brentp/bwa-meth) as the alignment tool and [MethylDackel](https://github.com/dpryan79/methyldackel) for post-processing.

Aligner Options
• Standard BWA-Meth (CPU-based): This option can be invoked via `--aligner bwameth` and uses the traditional BWA-Meth aligner and runs on CPU processors.
• Parabricks/FQ2BAMMETH (GPU-based): For higher performance, the pipeline can leverage the Parabricks implementation of BWA-Meth (fq2bammeth), which utilizes GPU processors. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.
bwa-meth aligner options:

- Standard `bwa-meth` (CPU-based): This option can be invoked via `--aligner bwameth` and uses the traditional BWA-Meth aligner and runs on CPU processors.

- `Parabricks/FQ2BAMMETH` (GPU-based): For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `--use_gpu` flag along with `--aligner bwameth`.

## Samplesheet input

Expand Down Expand Up @@ -130,6 +132,28 @@ genome: 'GRCh37'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).

### Providing `ext.args` to Tools

Additional arguments can be appended to a command in a module by specifying them within the module’s custom configuration. The configurations for modules and subworkflows used in the pipeline can be found in `conf/modules` or `conf/subworkflows`. A module’s publishDir path can also be customized in these configurations.

For example, users working with unfinished genomes containing tens or even hundreds of thousands of scaffolds, contigs, or chromosomes often encounter errors when pre-sorting reads into individual chromosome files. These errors are typically caused by the operating system’s limit on the number of file handles that can be open simultaneously (usually 1024; to find out this limit on Linux, use the command: ulimit -a).

To bypass this limitation, the `--scaffolds` option can be added as an additional `ext.args` in `conf/modules/bismark_methylationextractor.config`. This prevents methylation calls from being pre-sorted into individual chromosome files. Instead, all input files are temporarily merged into a single file (unless there is only one file), which is then sorted by both chromosome and position using the Unix sort command.

> For a detailed list of different options available, please refer to the official [Bismark](https://felixkrueger.github.io/Bismark/options/genome_preparation/) and [bwa-meth](https://github.com/brentp/bwa-meth) documentation.
### Running the `test` profile

Every nf-core pipeline comes with test data than can be run using `-profile test`. This test profile is useful for testing whether a user's environment is properly setup.

```bash
nextflow run nf-core/methylseq \
--input samplesheet.csv \
--outdir <OUTDIR> \
--genome GRCh38 \
-profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
```

### Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
Expand Down Expand Up @@ -299,7 +323,9 @@ The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementatio
If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon every time a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`.

1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19)

2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags)

3. Create the custom config accordingly:

- For Docker:
Expand Down

0 comments on commit 18b6784

Please sign in to comment.