Skip to content

Latest commit

 

History

History
99 lines (69 loc) · 6.15 KB

README.md

File metadata and controls

99 lines (69 loc) · 6.15 KB

GENE-SWitCH project RNA-Seq analysis pipeline

Installation

To use this pipeline, simply clone or download this repository, and install the dependencies:

Usage

Execute this nextflow pipeline with:

./run rnaseq.nf [arguments]

The ./run launcher script replaces the nextflow run command and grants these benefits:

  • Options can receive multiple space-separated parameters.
  • Long options are preceded by double dashes, following GNU conventions.
  • Temporary files and logs are written to the output directory, keeping the execution directory clean.
  • Temporary files are deleted after the pipeline has successfully completed.
  • The pipeline can be resumed from any directory with the --resume option.

Arguments

Option Parameter(s) Description Requirement
--profile <profile1> <profile2> ... Profile(s) to use when running the
pipeline. Specify the profiles that fit
your infrastructure among slurm,
singularity, docker.
Required
--output <directory> Output directory where all temporary
files, logs, and results are written.
Required
--reads <reads.fq> <*.bam> ... Input fastq file(s) and/or bam file(s).

For single-end reads, name your files:
prefix.{fq,fastq}[.gz]

For paired-end reads, name your files:
prefix_R{1,2}.{fq,fastq}[.gz]

For mapped reads, name your files:
prefix.bam
Required
--annotation <annotation.gff> Input reference annotation file. Required
--genome <genome.fa> Input genome sequence file. Required if fastq
files are provided
and --index is
absent.
--index <directory> Input genome index directory.
Overrides --genome.
Required if fastq
files are provided
and --genome is
absent.
--metadata <metadata.tsv> Input tabulated metadata file. Required if --merge
is provided.
--merge <factor1> <factor2> ... Factor(s) to merge reads files. See
the merge factors section for details.
Optional
--direction <rf|fr> Direction of reads. Either rf or fr. Optional
--max-cpus <16> Maximum number of CPU cores that
can be used for each process. This
is a limit, not the actual number of
requested CPU cores.
Optional
--max-memory <64GB> Maximum memory that can be used
for each process. This is a limit, not
the actual amount of alloted memory.
Optional
--max-time <12h> Maximum time that can be spent
on each process. This is a limit and
has no effect on the duration of each
process.
Optional
--resume Resume the pipeline after interruption.
Previously completed processes will
be skipped.
Optional

Merge factors

Use the --merge and --metadata options together to merge reads files after trimming and mapping. This results in genes and transcripts being counted by factor rather than by input file.

The metadata file consists of tab-separated values describing your input files. The first column must contain input file prefixes without extensions. There is no restriction on column names or number of columns.

Examples

Given the following tabulated metadata file:

input    diet      tissue
A        corn      liver
B        corn      liver
C        wheat     liver
D        wheat     muscle

With the following arguments:

--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet
  • A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet.
  • C and D mapped reads will be merged, resulting in gene and transcript counts for the wheat diet.

With the following arguments:

--reads A.fq B.fq C.fq D.bam --metadata metadata.tsv --merge diet tissue
  • A and B mapped reads will be merged, resulting in gene and transcript counts for the corn diet and liver tissue pair.
  • C mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and liver tissue pair.
  • D mapped reads will be left alone, resulting in gene and transcript counts for the wheat diet and muscle tissue pair.

Workflow

The pipeline executes the following processes:

  1. Control reads quality with FastQC.
    Outputs quality reports to output/quality/raw.
  2. Trim adaptators from reads with Trim Galore.
    Outputs quality reports to output/quality/trimmed.
  3. Index genome sequence wih STAR.
    Outputs indexed genome to output/index.
  4. Map reads to indexed genome with STAR.
    Outputs mapped reads to output/maps.
  5. Merge mapped reads by factors with Samtools.
    See the merge factors section for details.
  6. Assemble transcripts and combine them into a new assembly annotation with StringTie.
    Outputs the new assembly annotation to output/annotation.
  7. Count genes and transcripts with StringTie, and format them into tabulated files.
    Outputs TPM counts and average per-base read coverage to output/counts.
    Counts are given for the reference and assembly annotations separately.

About this project

The GENE-SWitCH project has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No 817998.

This repository reflects only the listed contributors views. Neither the European Commission nor its Agency REA are responsible for any use that may be made of the information it contains.