Skip to content
This repository has been archived by the owner on May 13, 2020. It is now read-only.

Commit

Permalink
Corrected formating error, removed 'GATK-style' interval list wording
Browse files Browse the repository at this point in the history
  • Loading branch information
bshifaw committed Mar 16, 2020
1 parent c79165e commit 9244b37
Showing 1 changed file with 8 additions and 21 deletions.
29 changes: 8 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
# gatk4-germline-cnvs
The workflows in the repository are used to detect germline copy number variants (gCNVs) on exome sequence data.
The workflows in the repository are used to detect germline copy number variants using GATKs GermlineCNVCaller on exome sequence data.

- Cohort WDL: Calling a cohort of samples and building a model for denoising further case samples: ``cnv_germline_cohort_workflow.wdl``
- Case WDL: Calling case samples using a previously built model for denoising: ``cnv_germline_case_workflow.wdl``
- Scattered case WDL (recommended): Functionally equivalent to case WDL, written for reducing cloud compute cost (see below) and wall-clock time ``cnv_germline_case_scattered_workflow.wdl``


- Case WDL (recommended): Calling case samples using a previously built model for denoising: ``cnv_germline_case_workflow.wdl``
- Scattered case WDL : Functionally equivalent to case WDL, written for reducing cloud compute cost (see below) and wall-clock time ``cnv_germline_case_scattered_workflow.wdl``

#### Setting up parameter json file for a run

Expand All @@ -19,8 +17,8 @@ The reference used must be the same between PoN and case samples.

- ``CNVGermlineCohortWorkflow.cohort_entity_id`` -- Name of the cohort. Will be used as a prefix for output filenames.
- ``CNVGermlineCohortWorkflow.contig_ploidy_priors`` -- TSV file containing prior probabilities for the ploidy of each contig, with column headers: CONTIG_NAME, PLOIDY_PRIOR_0, PLOIDY_PRIOR_1, ...
- ``CNVGermlineCohortWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:latest``).
- ``CNVGermlineCohortWorkflow.intervals`` -- Picard or GATK-style interval list. For WGS, this should typically only include the chromosomes of interest.
- ``CNVGermlineCohortWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:4.1.5.0``).
- ``CNVGermlineCohortWorkflow.intervals`` -- Picard interval list.

This comment has been minimized.

Copy link
@samuelklee

samuelklee Mar 16, 2020

Sorry if there was confusion, but it's OK to leave the original documentation here---all interval formats supported by the GATK engine are supported by the workflow.

- ``CNVGermlineCohortWorkflow.normal_bais`` -- List of BAI files. This list must correspond to `normal_bams`. For example, `["Sample1.bai", "Sample2.bai"]`.
- ``CNVGermlineCohortWorkflow.normal_bams`` -- List of BAM files. This list must correspond to `normal_bais`. For example, `["Sample1.bam", "Sample2.bam"]`.
- ``CNVGermlineCohortWorkflow.num_intervals_per_scatter`` -- Number of intervals (i.e., targets or bins) in each scatter for GermlineCNVCaller. If total number of intervals is not divisible by the value provided, the last scatter will contain the remainder.
Expand All @@ -46,7 +44,7 @@ The reference, number of intervals per scatter, and bins (if specified) must be
- ``CNVGermlineCaseWorkflow.contig_ploidy_model_tar`` -- Path to tar of the contig-ploidy model directory generated by the DetermineGermlineContigPloidyCohortMode task.
- ``CNVGermlineCaseWorkflow.gatk_docker`` -- GATK Docker image (e.g., ``broadinstitute/gatk:latest``).
- ``CNVGermlineCaseWorkflow.gcnv_model_tars`` -- Array of paths to tars of the contig-ploidy model directories generated by the GermlineCNVCallerCohortMode tasks.
- ``CNVGermlineCaseWorkflow.intervals`` -- Picard or GATK-style interval list. For WGS, this should typically only include the chromosomes of interest.
- ``CNVGermlineCaseWorkflow.intervals`` -- Picard interval list.
- ``CNVGermlineCaseWorkflow.num_intervals_per_scatter`` -- Number of intervals (i.e., targets or bins) in each scatter for GermlineCNVCaller. If total number of intervals is not divisible by the value provided, the last scatter will contain the remainder.
- ``CNVGermlineCaseWorkflow.ref_fasta_dict`` -- Path to reference dict file.
- ``CNVGermlineCaseWorkflow.ref_fasta_fai`` -- Path to reference fasta fai file.
Expand All @@ -61,14 +59,6 @@ Further explanation of these task-level parameters may be found by invoking the

Same required parameters as in the germline case workflow. However, in order to reduce wall-clock time and compute cost, it is recommended to optimize for the following parameters:

mportant Notes :
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://gatk.broadinstitute.org/hc/en-us/articles/360035530952).
- Please visit the [User Guide](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591) site for further documentation on our workflows and tools.
- Relevant reference and resources bundles can be accessed in [Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360036212652).

### Contact Us :
- The following material is provided by the Data Science Platforum group at the Broad Institute. Pleas- ``CNVGermlineCaseScatteredWorkflow.num_samples_per_scatter_block`` -- (recommended WES value=25) number of samples to process in a single block; blocks of this size will be sent to the germline case workflow and processed in a batch;
- ``CNVGermlineCaseScatteredWorkflow.preemptible_attempts`` -- (recommended value=5) this reduces cost by using preemptible instances
- ``CNVGermlineCaseScatteredWorkflow.mem_gb_for_determine_germline_contig_ploidy`` -- amount of memory allotted for ploidy determination tasks (the lower the cheaper)
Expand All @@ -78,10 +68,7 @@ view the following tutorial [(How to) Execute Workflows from the gatk-workflows
- ``CNVGermlineCaseScatteredWorkflow.cpu_for_germline_cnv_caller`` -- number of CPU cores allotted for gCNV caller tasks (the lower the cheaper)
- ``CNVGermlineCaseScatteredWorkflow.disk_for_germline_cnv_caller`` -- amount of storage allotted for gCNV caller tasks (the lower the cheaper)

Note that lowering disk and memory too much will eventually lead to the workflow failing. Lowering thee direct any questions or concerns to one of our forum sites : [GATK](https://gatk.broadinstitute.org/hc/en-us/community/topics) or [Terra](https://support.terra.bio/hc/en-us/community/topics/360000500432). number of CPU cores could increase the wall-clock times.

### Output :
(Example: - A BAM file and its index)
Note that lowering disk and memory too much will eventually lead to the workflow failing. Lowering the number of CPU cores could increase the wall-clock times.

### Software version notes :
- GATK 4.1.5.0
Expand All @@ -91,7 +78,7 @@ Note that lowering disk and memory too much will eventually lead to the workflow

### Important Notes :
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please
- For help running workflows on the Google Cloud Platform or locally, please
view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://gatk.broadinstitute.org/hc/en-us/articles/360035530952).
- Please visit the [User Guide](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591) site for further documentation on our workflows and tools.
- Relevant reference and resources bundles can be accessed in [Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360036212652).
Expand Down

0 comments on commit 9244b37

Please sign in to comment.