gatk3-data-processing

GATK3 and this workflow is now longer supported, this repo is intended for legacy purposes.

Workflows for processing high-throughput sequencing data for variant discovery with GATK3 and related tools

The processing-for-variant-discovery-gatk3 WDL pipeline implements data pre-processing according to the GATK Best Practices (June 2016).

Pair-end sequencing data in unmapped BAM (uBAM) format
One or more read groups, one per uBAM file, all belonging to a single sample (SM)
Input uBAM files must additionally comply with the following requirements:
- filenames all have the same suffix (we use ".unmapped.bam")
- files must pass validation by ValidateSamFile
- reads are provided in query-sorted order
- all reads must have an RG tag

Cromwell version support

Runtime parameters are optimized for Broad's Google Cloud Platform implementation.

Provide feedback