To use this pipeline, simply clone or download this repository, and install the dependencies:
- Nextflow >= 20.01.0
- Docker >= 19.03.2 or Singularity >= 3.4
Execute this nextflow pipeline with:
./run rnaseq.nf [arguments]
The ./run
launcher script replaces the nextflow run
command and grants these benefits:
- Options can receive multiple space-separated parameters.
- Long options are preceded by double dashes, following GNU conventions.
- Temporary files and logs are written to the output directory, keeping the execution directory clean.
- Temporary files are deleted after the pipeline has successfully completed.
- The pipeline can be resumed from any directory with the
--resume
option.
Option | Parameter(s) | Description | Requirement |
---|---|---|---|
--profile |
<profile1> <profile2> ... |
Profile(s) to use when running the pipeline. Specify the profiles that fit your infrastructure among slurm ,singularity , docker . |
Required |
--output |
<directory> |
Output directory where all temporary files, logs, and results are written. |
Required |
--reads |
<reads.fq> <*.bam> ... |
Input fastq file(s) and/or bam file(s).For single-end reads, name your files: prefix.{fq,fastq}[.gz] For paired-end reads, name your files: prefix_R{1,2}.{fq,fastq}[.gz] For mapped reads, name your files: prefix.bam |
Required |
--annotation |
<annotation.gff> |
Input reference annotation file. | Required |
--genome |
<genome.fa> |
Input genome sequence file. | Required if fastq files are provided and --index isabsent. |
--index |
<directory> |
Input genome index directory. Overrides --genome . |
Required if fastq files are provided and --genome isabsent. |
--metadata |
<metadata.tsv> |
Input tabulated metadata file. | Required if --merge is provided. |
--merge |
<factor1> <factor2> ... |
Factor(s) to merge reads files. See the merge factors section for details. |
Optional |
--direction |
<rf|fr> |
Direction of reads. Either rf or fr . |
Optional |
--max-cpus |
<16> |
Maximum number of CPU cores that can be used for each process. This is a limit, not the actual number of requested CPU cores. |
Optional |
--max-memory |
<64GB> |
Maximum memory that can be used for each process. This is a limit, not the actual amount of alloted memory. |
Optional |
--max-time |
<12h> |
Maximum time that can be spent on each process. This is a limit and has no effect on the duration of each process. |
Optional |
--resume |
Resume the pipeline after interruption. Previously completed processes will be skipped. |
Optional |
Use the --merge
and --metadata
options together to merge reads files after trimming and mapping. This results in genes and transcripts being counted by factor rather than by input file.
The metadata file consists of tab-separated values describing your input files. The first column must contain input file prefixes without extensions. There is no restriction on column names or number of columns.
Given the following tabulated metadata file:
input diet tissue
A corn liver
B corn liver
C wheat liver
D wheat muscle
With the following arguments:
--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet
- A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet.
- C and D mapped reads will be merged, resulting in gene and transcript counts for the wheat diet.
With the following arguments:
--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet tissue
- A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet and liver tissue pair.
- C mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and liver tissue pair.
- D mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and muscle tissue pair.
The pipeline executes the following processes:
- Control reads quality with FastQC.
Outputs quality reports tooutput/quality/raw
. - Trim adaptators from reads with Trim Galore.
Outputs quality reports tooutput/quality/trimmed
. - Index genome sequence wih STAR.
Outputs indexed genome tooutput/index
. - Map reads to indexed genome with STAR.
Outputs mapped reads tooutput/maps
. - Merge mapped reads by factors with Samtools.
See the merge factors section for details. - Assemble transcripts and combine them into a new assembly annotation with StringTie.
Outputs the new assembly annotation tooutput/annotation
. - Count genes and transcripts with StringTie, and format them into tabulated files.
Outputs TPM counts and average per-base read coverage tooutput/counts
.
Counts are given for the reference and assembly annotations separately.
The GENE-SWitCH project has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No 817998.
This repository reflects only the listed contributors views. Neither the European Commission nor its Agency REA are responsible for any use that may be made of the information it contains.