Skip to content

A basic and quick workflow for differential expression analysis

License

Notifications You must be signed in to change notification settings

stracquadaniolab/quick-rnaseq-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bd75a62 · Jun 17, 2022

History

42 Commits
Jun 14, 2022
Oct 13, 2021
Jun 14, 2022
Jun 14, 2022
Jun 14, 2022
Sep 30, 2021
Oct 1, 2021
Jun 15, 2022
Jun 14, 2022
Jun 14, 2022
Jun 17, 2022

Repository files navigation

quick-rnaseq-nf

DOI

A basic and quick workflow for differential expression analysis.

Overview

Configuration

  • codename: mnemonic codename for the run (default: 'quick-rnaseq')
  • outdir: directory where to store the results (default: './results')
  • experiment.samplesheet: CSV file describing samples and conditions (required).
  • experiment.contrasts: contrasts for differential expression analysis in the format [['case1', 'control'],['case2','control']], which performs the analysis of case1 vs control samples, and case2 vs control samples. (required)
  • transcriptome.reference: transcriptome reference file. Gencode recommended (required. GZ format)
  • transcriptome.decoys: reference genome file. Gencode recommended (required. GZ format)
  • fastp.args: options for reads trimming using fastp. (default: '')
  • salmon.index.args: options for Salmon index, e.g. '--gencode' for Gencode transcritomes.
  • salmon.quant.libtype: library type for quantification (default: 'A', Salmon infers lib type)
  • salmon.quant.args: Salmon options. (default: '--validateMappings --gcBias')
  • summarize_to_gene.counts_from_abundance: infer counts from abundances using tximeta (default: 'no')
  • summarize_to_gene.organism_db: Bioconductor organism package for annotation. Currently supporting org.Hs.eg.db for Human and org.Mm.eg.db for mouse and (default: 'org.Hs.eg.db')
  • qc.pca.transform: counts transformation for PCA analysis (see DESeq2. default: 'rlog')
  • qc.ma.lfc_threshold: log fold-change threshold for MA plot (see DESeq2. default: 0)
  • qc.sample.transform: counts transformation for PCA analysis (see DESeq2. default: 'rlog')
  • dge.lfc_threshold: log fold-change threshold for differential expression analysis (see DESeq2. default: 0)
  • dge.fdr: false discovery rate threshold to be used with implicit filtering (see DESeq2. default: 0.05)
  • gene_ontology.organism_db: Bioconductor organism package for annotation. Currently supporting org.Hs.eg.db for Human and org.Mm.eg.db for mouse and (default: 'org.Hs.eg.db')
  • gene_ontology.gene_id: type of gene id used for the analysis (default: 'ensembl')
  • gene_ontology.remove_gencode_version: remove Gencode version from gene id (default: 'yes')
  • gene_ontology.fdr: false discovery rate threshold (default: 0.05)

Running the workflow

Install or update the workflow

nextflow pull stracquadaniolab/quick-rnaseq-nf

Run the analysis with test data and Docker

nextflow run stracquadaniolab/quick-rnaseq-nf -profile test,docker

Run the analysis with test data and Singularity

nextflow run stracquadaniolab/quick-rnaseq-nf -profile test,singularity

Run the analysis with test data, Singularity and Slurm

nextflow run stracquadaniolab/quick-rnaseq-nf -profile test,singularity,slurm

Run the analysis on human data with Singularity and Slurm

Prepare a samplesheet.csv file as follows:

sample,read1,read2,condition
6C_REP1,data/RF01_6C1_R1_001.fastq.gz,data/RF01_6C1_R2_001.fastq.gz,control
6C_REP2,data/RF01_6C2_R1_001.fastq.gz,data/RF01_6C2_R2_001.fastq.gz,case

Please note that the header is required and it is case sensitive.

Prepare a nextflow.config file as follows:

params {
  // experiment information
  experiment.samplesheet = "./samplesheet.csv"
  experiment.contrasts = [['case1', 'control'],['case2','control']]

  // transcriptome information 
  transcriptome.reference = "gencode.v40.transcripts.fa.gz"
  transcriptome.decoys = "GRCh38.primary_assembly.genome.fa.gz"
}

Now you can run quick-rnaseq as follows:

nextflow run stracquadaniolab/quick-rnaseq-nf -profile singularity,slurm

Results

  • results/analysis/dge-<contrasts>.csv: file with the differential expression analysis results for a given contrast.
  • results/analysis/go-<contrasts>.csv: file with the GO analysis results for a given contrast.
  • results/dataset/summarized-experiment.rds: DESeqDataset object with all experimental information (e.g. gene counts)
  • results/qc: quality control report
  • results/quantification/<sample-name>: Salmon quantification folders for each sample.

Authors