GENE-SWitCH project RNA-Seq analysis pipeline

Installation

To use this pipeline, simply clone or download this repository, and install the dependencies:

Nextflow >= 20.01.0
Docker >= 19.03.2 or Singularity >= 3.4

Usage

Execute this nextflow pipeline with:

./run rnaseq.nf [arguments]

The ./run launcher script replaces the nextflow run command and grants these benefits:

Options can receive multiple space-separated parameters.
Long options are preceded by double dashes, following GNU conventions.
Temporary files and logs are written to the output directory, keeping the execution directory clean.
Temporary files are deleted after the pipeline has successfully completed.
The pipeline can be resumed from any directory with the --resume option.

Arguments

Option	Parameter(s)	Description	Requirement
`--profile`	`<profile1>` `<profile2>` `...`	Profile(s) to use when running the pipeline. Specify the profiles that fit your infrastructure among `slurm`, `singularity`, `docker`.	Required
`--output`	`<directory>`	Output directory where all temporary files, logs, and results are written.	Required
`--reads`	`<reads.fq>` `<*.bam>` `...`	Input `fastq` file(s) and/or `bam` file(s). For single-end reads, name your files: `prefix.{fq,fastq}[.gz]` For paired-end reads, name your files: `prefix_R{1,2}.{fq,fastq}[.gz]` For mapped reads, name your files: `prefix.bam`	Required
`--annotation`	`<annotation.gff>`	Input reference annotation file.	Required
`--genome`	`<genome.fa>`	Input genome sequence file.	Required if `fastq` files are provided and `--index` is absent.
`--index`	`<directory>`	Input genome index directory. Overrides `--genome`.	Required if `fastq` files are provided and `--genome` is absent.
`--metadata`	`<metadata.tsv>`	Input tabulated metadata file.	Required if `--merge` is provided.
`--merge`	`<factor1>` `<factor2>` `...`	Factor(s) to merge reads files. See the merge factors section for details.	Optional
`--direction`	`<rf\|fr>`	Direction of reads. Either `rf` or `fr`.	Optional
`--max-cpus`	`<16>`	Maximum number of CPU cores that can be used for each process. This is a limit, not the actual number of requested CPU cores.	Optional
`--max-memory`	`<64GB>`	Maximum memory that can be used for each process. This is a limit, not the actual amount of alloted memory.	Optional
`--max-time`	`<12h>`	Maximum time that can be spent on each process. This is a limit and has no effect on the duration of each process.	Optional
`--resume`		Resume the pipeline after interruption. Previously completed processes will be skipped.	Optional

Merge factors

Use the --merge and --metadata options together to merge reads files after trimming and mapping. This results in genes and transcripts being counted by factor rather than by input file.

The metadata file consists of tab-separated values describing your input files. The first column must contain input file prefixes without extensions. There is no restriction on column names or number of columns.

Examples

Given the following tabulated metadata file:

input    diet      tissue
A        corn      liver
B        corn      liver
C        wheat     liver
D        wheat     muscle

With the following arguments:

--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet

A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet.
C and D mapped reads will be merged, resulting in gene and transcript counts for the wheat diet.

With the following arguments:

--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet tissue

A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet and liver tissue pair.
C mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and liver tissue pair.
D mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and muscle tissue pair.

Workflow

The pipeline executes the following processes:

Control reads quality with FastQC.
Outputs quality reports to output/quality/raw.
Trim adaptators from reads with Trim Galore.
Outputs quality reports to output/quality/trimmed.
Index genome sequence wih STAR.
Outputs indexed genome to output/index.
Map reads to indexed genome with STAR.
Outputs mapped reads to output/maps.
Merge mapped reads by factors with Samtools.
See the merge factors section for details.
Assemble transcripts and combine them into a new assembly annotation with StringTie.
Outputs the new assembly annotation to output/annotation.
Count genes and transcripts with StringTie, and format them into tabulated files.
Outputs TPM counts and average per-base read coverage to output/counts.
Counts are given for the reference and assembly annotations separately.

About this project

The GENE-SWitCH project has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No 817998.

This repository reflects only the listed contributors views. Neither the European Commission nor its Agency REA are responsible for any use that may be made of the information it contains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GENE-SWitCH project RNA-Seq analysis pipeline

Installation

Usage

Arguments

Merge factors

Examples

Workflow

About this project

Files

README.md

Latest commit

History

README.md

File metadata and controls

GENE-SWitCH project RNA-Seq analysis pipeline

Installation

Usage

Arguments

Merge factors

Examples

Workflow

About this project