Skip to content

Latest commit

 

History

History
111 lines (83 loc) · 5.03 KB

README.md

File metadata and controls

111 lines (83 loc) · 5.03 KB

Marine Omics Variant Pipeline

Build

Designed to process data from raw reads through to vcf.

graph TD;
	fastq-->fastqc;
	fastqc-->multiqc;
	fastq-->fastp;
	fastp-->multiqc;
	fastp-->bwa_mem;
	genome-->bwa_index;
	bwa_index-->bwa_mem;
	bwa_mem-->gatk_mark_duplicates;
	gatk_mark_duplicates-->samtools_stats;
	samtools_stats-->multiqc;
	gatk_mark_duplicates-->freebayes;
	gatk_mark_duplicates-->bcftools_mpileup;
Loading

Quick Start

  1. Install nextflow. If working on JCU infrastructure please see these detailed instructions
  2. Run a test to make sure everything is installed properly. The command below should work on a linux machine with singularity installed (eg JCU HPC).
nextflow run marine-omics/movp -latest -profile singularity,test -r main

If you are working from a mac or windows machine you will need to use docker.

nextflow run marine-omics/movp -latest -profile docker,test -r main
  1. Create the sample csv file (example below)
sample,fastq_1,fastq_2
1,sample1_r1.fastq.gz,sample1_r2.fastq.gz
2,sample2_r1.fastq.gz,sample2_r2.fastq.gz

Paths should either be given as absolute paths or relative to the launch directory (where you invoked the nextflow command)

  1. Choose a profile for your execution environment. This depends on where you are running your code. movp comes with preconfigured profiles that should work on JCU infrastructure and pawsey/setonix. These are
    • JCU HPC (ie zodiac) : Use -profile zodiac
    • genomics12 (HPC nodes without pbs): Use -profile genomics
    • setonix: Use -profile setonix and set your slurm account with --slurm_account pawseyXXXX

If you need to customise further you can create your own custom.config file and invoke with option -c custom.config. See nextflow.config for ideas on what parameters can be set.

  1. Run the workflow with your genome and samples file
nextflow run marine-omics/movp -profile singularity,zodiac -r main --genome <genomefile> --samples <samples.csv> --outdir myoutputs

Installing Nextflow on a system with an old java version.

Our JCU HPC systems are still running java 8 but nextflow requires 11 or newer. One way around this is to use sdkman to install and manage a different java version. This is now the preferred way to install java for nextflow (See instructions here.

Troubleshooting

Docker image

When running for the first time nextflow will need to download the docker image from dockerhub and convert it to a singularity image. This can be slow, and nextflow doesn't make it easy to monitor progress. If this step is failing you can try downloading the image separately yourself.

First make sure you set your NXF_SINGULARITY_CACHEDIR variable to a path where you can permanently store the singularity images required by movp. For example to put it .nxf/singularity_cache in your home directory you would do;

mkdir ~/.nxf/singularity_cache
export NXF_SINGULARITY_CACHEDIR=${HOME}/.nxf/singularity_cache

This will create the directory and set the value of NXF_SINGULARITY_CACHEDIR for your current login session. To make this setting permanent you should add the export command shown above to your .bash_profile

Next pull the image from dockerhub. This command will download the image, convert to singularity format and place it in your previously defined NXF_SINGULARITY_CACHEDIR. Note that this command is specific for container version 0.4.

singularity pull  --name ${NXF_SINGULARITY_CACHEDIR}/iracooke-movp-0.4.img docker://iracooke/movp:0.4

Customising resource usage

The default resource limits for individual processes are often going to need tweaking for individidual projects. This can be done fairly easily by creating a custom config file.

For example, if you want to increase memory and cpu requests for the bwa_mem_gatk and gatk_mark_duplicates steps you would create a custom config as follows

process {
	withName: 'bwa_mem_gatk'{
		cpus=12
		memory=10.GB
	}
	withName: 'gatk_mark_duplicates'{
		cpus=12
		memory=30.GB
	}
}

Save this into a file called local.config and then run tell nextflow to use it with the -c option as follows

nextflow run marine-omics/movp -latest -profile singularity,zodiac -r main <genomefile> --samples <samples.csv> --outdir myoutputs -c local.config

When running on the JCU HPC jobs will be submitted to the queuing system, which is PBS Pro. Options available to set are described here.

Running in the background

If your workflow will take a long time you may want to run it in the background. This will ensure that the workflow continues even if you logout. To do this simply add the -bg option. Once the workflow is running in the background you can check progress using