Dev #2

bshifaw · 2020-03-12T19:32:12Z

Uploaded GATK 4.1.5.0 version of the gcnv pipeline.
Edited the import links to point to git urls.
Added input and references in input json files.

fix gatk version

samuelklee

Thanks again for putting this together, @bshifaw! Commented on a few minor typos and some more serious issues (which we've touched upon elsewhere, i.e., 1) how data-specific parameter recommendations should be handled, and 2) the need for a common set of sensible plumbing test resources). Not sure who should make the final call on such issues, but I think Comms needs to provide some guidance. No need to address them for v1 of this workflow, but perhaps we should think about addressing them across all of our workflows sooner rather than later.

Also, do we want to link to the old tutorial? What is the progress on the new one?

README.md

samuelklee · 2020-03-16T14:06:43Z

cnv_germline_case_scattered_workflow.inputs.json

+  "CNVGermlineCaseScatteredWorkflow.gcnv_max_copy_number": 3,
+  "CNVGermlineCaseScatteredWorkflow.gcnv_max_calling_iters": 1,
+  "CNVGermlineCaseScatteredWorkflow.maximum_number_events_per_sample": 120,
+  "CNVGermlineCaseScatteredWorkflow.ref_fasta_dict": "gs://gatk-best-practices/cnv_germline_pipeline/Homo_sapiens_assembly19.truncated.dict",


Do we still want to use hg19 test data?

Good point. I say onwards and upwards -- to hg38!

Happy to use another set but I'll need help finding resource files (e.g. contig_ploidy_model_tar)

Once we decide on the appropriate sample BAMs, the resource models used in the case workflow will be generated by running the cohort workflow.

If its helpful here are a list of downsample 1k human genome project bam samples files here: https://console.cloud.google.com/storage/browser/gatk-test-data/1kgp/downsampled_bam_hg38/?project=broad-dsde-outreach&organizationId=548622027621

cnv_germline_case_scattered_workflow.wdl

README.md

ldgauthier

I mostly just assumed the WDLs were the same as the repo, which are already reviewed and tested.

I think there was a copy paste hiccup in the readme responsible for a bunch of the typos.

Some of these may actually be changes to the file in the GATK repo, since a lot of the text is copied. If that's the case, can you make a GATK PR or open an issue?

README.md

ldgauthier · 2020-03-16T15:00:56Z

README.md

+In additional, there are optional workflow-level and task-level parameters that may be set by advanced users; for example:
+
+- ``CNVGermlineCohortWorkflow.do_explicit_gc_correction`` -- (optional) If true, perform explicit GC-bias correction when creating PoN and in subsequent denoising of case samples.  If false, rely on PCA-based denoising to correct for GC bias.
+- ``CNVGermlineCohortWorkflow.PreprocessIntervals.bin_length`` -- Size of bins (in bp) for coverage collection.  *This must be the same value used for all case samples.*


Is this just for WGS?

According to the tutorial it should be set to 1000 for WGS and 0 for exomes. I noticed the default value for the task that uses the variable (PreprocessIntervals) has a default value for the variable of 1000. Since this workflow is directed towards exomes should the input json have this variable set to 0? @samuelklee

You can technically use it for WES as well (to break targets up into smaller bins), we just haven't evaluated it. Let's leave the documentation unchanged, but set the default to 0 in the JSON---thanks for catching that, @bshifaw!

cnv_common_tasks.wdl

ldgauthier · 2020-03-16T15:33:37Z

cnv_germline_case_scattered_workflow.inputs.json

+  "CNVGermlineCaseScatteredWorkflow.gcnv_max_copy_number": 3,
+  "CNVGermlineCaseScatteredWorkflow.gcnv_max_calling_iters": 1,
+  "CNVGermlineCaseScatteredWorkflow.maximum_number_events_per_sample": 120,
+  "CNVGermlineCaseScatteredWorkflow.ref_fasta_dict": "gs://gatk-best-practices/cnv_germline_pipeline/Homo_sapiens_assembly19.truncated.dict",


Good point. I say onwards and upwards -- to hg38!

cnv_germline_case_scattered_workflow.wdl

…tion

asmirnov239

Looks good, thank you for putting it together

asmirnov239 · 2020-03-18T05:30:30Z

README.md

+### LICENSING :
+Copyright Broad Institute, 2020 | BSD-3
+This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.


Does the license information not go in the beginning of each individual WDL?

Yes it should be, but not sure why its missing for this repo. I'll add it now. As far why this section is here, its a note to inform the user that there might instance where a workflow will use none GATK tools and it's up to them to determine whether they can use it for their use case.

cnv_germline_case_scattered_workflow.inputs.json

cnv_germline_case_workflow.inputs.json

cnv_germline_case_scattered_workflow.wdl

bshifaw

Thanks again for putting this together, @bshifaw! Commented on a few minor typos and some more serious issues (which we've touched upon elsewhere, i.e., 1) how data-specific parameter recommendations should be handled, and 2) the need for a common set of sensible plumbing test resources). Not sure who should make the final call on such issues, but I think Comms needs to provide some guidance. No need to address them for v1 of this workflow, but perhaps we should think about addressing them across all of our workflows sooner rather than later.

Also, do we want to link to the old tutorial? What is the progress on the new one?

@samuelklee

Added a link to the tutorial near the header.
I got through a few of the comments, the ones that still remain is related to using a new input set. There are a few downsampled hg38 exome bam files in the following bucket. I tried using them with the workflow but I ran into some errors, most likely because the parameters were not set right or the correct resource file wasn't available. I can work on this during the next version of the workflow.

samuelklee · 2020-03-19T13:10:47Z

Thanks for these changes, @bshifaw! We can stick with hg19 for now. Note that there are many other parameters that are still set to "plumbing" values, other than the one that you changed. However, I am not sure if setting these to default/realistic values (with the assumption that most users will not necessarily know to change them for their actual runs) will work with this plumbing data. If so, I am OK with removing all non-optional parameters from the JSON.

I'm inclined to move towards non-plumbing and more realistic data for showcasing Featured Workspaces, which will hopefully provide a more useful starting point and cost estimate for users. This is ultimately up to Comms, but it should also be considered as part of a larger conversation about testing, as we've discussed elsewhere.

I'll approve, but Mark W. should probably have the final say on this PR before it goes in.

bshifaw · 2020-03-20T13:19:26Z

Note that there are many other parameters that are still set to "plumbing" values, other than the one that you changed. However, I am not sure if setting these to default/realistic values (with the assumption that most users will not necessarily know to change them for their actual runs) will work with this plumbing data. If so, I am OK with removing all non-optional parameters from the JSON.

I'm inclined to move towards non-plumbing and more realistic data for showcasing Featured Workspaces, which will hopefully provide a more useful starting point and cost estimate for users. This is ultimately up to Comms, but it should also be considered as part of a larger conversation about testing, as we've discussed elsewhere.

I'll approve, but Mark W. should probably have the final say on this PR before it goes in.

bshifaw added 7 commits May 26, 2019 15:48

added wdl and json

01d6e27

added git raw url imports

47b3d64

Update to gatk4.1.5.0

a2bd777

Update README.md

779a214

fix gatk version

updated contact us message

84d4ed8

Added plumbing input sample and resources

09c27f2

Updated import url to git links

c79165e

samuelklee suggested changes Mar 16, 2020

View reviewed changes

ldgauthier reviewed Mar 16, 2020

View reviewed changes

ldgauthier mentioned this pull request Mar 16, 2020

Update CNV WDLs to WDL 1.0. broadinstitute/gatk#6502

Closed

bshifaw added 10 commits March 16, 2020 16:29

Corrected formating error, removed 'GATK-style' interval list wording

9244b37

Testing Dockstore subworkflow relative paths

fda80ad

minor edits

0f5332c

removed url imports

56113ea

minor edits to ReadMe

11f1870

Added 'GATK-style' as possible interval format, minor spelling correc…

7b4ae3a

…tion

Set bin length to 0 in json

97c96b3

Added example for contig_ploidy_priors discription

6542dae

specified human panel size for max num events variable

82da3a0

Added link to tutorial

fb56dd4

asmirnov239 approved these changes Mar 18, 2020

View reviewed changes

bshifaw added 2 commits March 18, 2020 21:02

Removed case scattered workflow

197d34d

removed gcnv_disable_annealing from json to set variable to default

7db1cb2

bshifaw commented Mar 18, 2020

View reviewed changes

Added license file

8a1167c

samuelklee approved these changes Mar 19, 2020

View reviewed changes

bshifaw closed this Mar 20, 2020

bshifaw reopened this Mar 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev #2

Dev #2

bshifaw commented Mar 12, 2020 •

edited

Loading

samuelklee left a comment

samuelklee Mar 16, 2020

ldgauthier Mar 16, 2020

bshifaw Mar 16, 2020

samuelklee Mar 16, 2020

bshifaw Mar 16, 2020

ldgauthier left a comment

ldgauthier Mar 16, 2020

bshifaw Mar 17, 2020

samuelklee Mar 17, 2020

ldgauthier Mar 16, 2020

asmirnov239 left a comment

asmirnov239 Mar 18, 2020

bshifaw Mar 18, 2020

bshifaw left a comment

samuelklee commented Mar 19, 2020

bshifaw commented Mar 20, 2020

Dev #2

Are you sure you want to change the base?

Dev #2

Conversation

bshifaw commented Mar 12, 2020 • edited Loading

samuelklee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ldgauthier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asmirnov239 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bshifaw left a comment

Choose a reason for hiding this comment

samuelklee commented Mar 19, 2020

bshifaw commented Mar 20, 2020

bshifaw commented Mar 12, 2020 •

edited

Loading