Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FASTQ linting during preprocessing #1461

Merged
merged 24 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- uses: actions/setup-java@8df1039502a15bceb9433410b1a100fbe190c53b # v4
with:
distribution: "temurin"
java-version: "17"

- name: Set up Nextflow
uses: nf-core/setup-nextflow@v2
with:
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

# 3.18.0dev - xxxx-xx-xx

### Credits

Special thanks to the following for their contributions to the release:

- [Caitlin Winkler](https://github.com/oligomyeggo)

### Enhancements & fixes

- [PR #1461](https://github.com/nf-core/rnaseq/pull/1461) - Add FASTQ linting during preprocessing

## [[3.17.0](https://github.com/nf-core/rnaseq/releases/tag/3.17.0)] - 2024-10-23

### Credits
Expand Down
15 changes: 15 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Pipeline overview](#pipeline-overview)
- [Preprocessing](#preprocessing)
- [cat](#cat)
[fq lint](#fq-lint)
- [FastQC](#fastqc)
- [UMI-tools extract](#umi-tools-extract)
- [TrimGalore](#trimgalore)
Expand Down Expand Up @@ -73,6 +74,20 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

If multiple libraries/runs have been provided for the same sample in the input samplesheet (e.g. to increase sequencing depth) then these will be merged at the very beginning of the pipeline in order to have consistent sample naming throughout the pipeline. Please refer to the [usage documentation](https://nf-co.re/rnaseq/usage#samplesheet-input) to see how to specify these samples in the input samplesheet.

# fq lint

<details markdown="1">
<summary>Output files</summary>

- `fq_lint/*`
- `*.fq_lint.txt`: Linting report per library from `fq lint`.

> **NB:** You will see subdirectories here based on the stage of preprocessing for the files that have been linted, for example `raw`, `trimmed`.

</details>

[fq lint](https://github.com/stjude-rust-labs/fq) runs several checks on input FASTQ files. It will fail with a non-zero error code when issues are found, which will terminate the workflow execution. In the absence of this, the successful linting produces the logs you will find here.

### FastQC

<details markdown="1">
Expand Down
6 changes: 6 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,a
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto
```

### Linting

By default, the pipeline will run [fq lint](https://github.com/stjude-rust-labs/fq) on all input FASTQ files, both at the start of preprocessing and after each preprocessing step that manipulates FASTQ files. If errors are found, and error will be reported and the workflow will stop.

The `extra_fqlint_args` parameter can be manipulated to disable [any validator](https://github.com/stjude-rust-labs/fq?tab=readme-ov-file#validators) from `fq` you wish. For example, we have found that checks on the names of paired reads are prone to failure, so that check is disabled by default (setting `extra_fqlint_args` to `--disable-validator P001`).

### Strandedness Prediction

If you set the strandedness value to `auto`, the pipeline will sub-sample the input FastQ files to 1 million reads, use Salmon Quant to automatically infer the strandedness, and then propagate this information through the rest of the pipeline. This behavior is controlled by the `--stranded_threshold` and `--unstranded_threshold` parameters, which are set to 0.8 and 0.1 by default, respectively. This means:
Expand Down
7 changes: 6 additions & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["fastq_fastqc_umitools_fastp", "fastq_fastqc_umitools_trimgalore"]
},
"fq/lint": {
"branch": "master",
"git_sha": "2c0260ed80daeca9c6dfa477a4daf04ff336dc37",
"installed_by": ["fastq_qc_trim_filter_setstrandedness", "modules"]
},
"fq/subsample": {
"branch": "master",
"git_sha": "a1abf90966a2a4016d3c3e41e228bfcbd4811ccc",
Expand Down Expand Up @@ -341,7 +346,7 @@
},
"fastq_qc_trim_filter_setstrandedness": {
"branch": "master",
"git_sha": "9082d6440bdffbb4f5d9bd9d753361933b3febcb",
"git_sha": "2c0260ed80daeca9c6dfa477a4daf04ff336dc37",
"installed_by": ["subworkflows"]
},
"fastq_subsample_fq_salmon": {
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/fq/lint/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions modules/nf-core/fq/lint/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions modules/nf-core/fq/lint/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions modules/nf-core/fq/lint/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ params {
umi_discard_read = null
save_umi_intermeds = false

// Linting
skip_linting = false
extra_fqlint_args = '--disable-validator P001'

// Trimming
trimmer = 'trimgalore'
min_trimmed_reads = 10000
Expand Down Expand Up @@ -328,7 +332,7 @@ manifest {
description = """RNA sequencing analysis pipeline for gene/isoform quantification and extensive quality control."""
mainScript = 'main.nf'
nextflowVersion = '!>=24.04.2'
version = '3.17.0'
version = '3.18.0dev'
doi = 'https://doi.org/10.5281/zenodo.1400710'
}

Expand Down
19 changes: 15 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@
},
"min_mapped_reads": {
"type": "number",
"default": 5.0,
"default": 5,
"fa_icon": "fas fa-percentage",
"description": "Minimum percentage of uniquely mapped reads below which samples are removed from further processing.",
"help_text": "Some downstream steps in the pipeline will fail if this threshold is too low."
Expand Down Expand Up @@ -456,14 +456,14 @@
"stranded_threshold": {
"type": "number",
"minimum": 0.5,
"maximum": 1.0,
"maximum": 1,
"default": 0.8,
"description": "The fraction of stranded reads that must be assigned to a strandedness for confident assignment. Must be at least 0.5."
},
"unstranded_threshold": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"minimum": 0,
"maximum": 1,
"default": 0.1,
"description": "The difference in fraction of stranded reads assigned to 'forward' and 'reverse' below which a sample is classified as 'unstranded'. By default the forward and reverse fractions must differ by less than 0.1 for the sample to be called as unstranded."
}
Expand Down Expand Up @@ -539,6 +539,12 @@
"description": "Additional quality control options.",
"default": "",
"properties": {
"extra_fqlint_args": {
"type": "string",
"default": "--disable-validator P001",
"description": "Extra arguments to pass to the fq lint command.",
"fa_icon": "far fa-check-square"
},
"deseq2_vst": {
"type": "boolean",
"description": "Use vst transformation instead of rlog with DESeq2.",
Expand Down Expand Up @@ -602,6 +608,11 @@
"fa_icon": "fas fa-compress-alt",
"description": "Skip the UMI extraction from the read in case the UMIs have been moved to the headers in advance of the pipeline run."
},
"skip_linting": {
"type": "boolean",
"fa_icon": "fas fa-fast-forward",
"description": "Skip linting checks during FASTQ preprocessing and filtering."
},
"skip_trimming": {
"type": "boolean",
"description": "Skip the adapter trimming step.",
Expand Down
Loading
Loading