-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organizational and cleanup tweaks #82
Conversation
Bleep bloop, I am a robot. Alas, some of the Nextflow configuration tests failed! test/configtest-F16.json@ ["params","bundle_contest_hapmap_3p3_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz.tbi"
@ ["params","bundle_known_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
@ ["params","bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"
@ ["params","bundle_v0_dbsnp138_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi"
@ ["params","reference_fasta_dict"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.dict"
@ ["params","reference_fasta_fai"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta.fai" test/configtest-F32.json@ ["params","bundle_contest_hapmap_3p3_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz.tbi"
@ ["params","bundle_known_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
@ ["params","bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"
@ ["params","bundle_v0_dbsnp138_vcf_gz_tbi"]
+ "/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi"
@ ["params","reference_fasta_dict"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.dict"
@ ["params","reference_fasta_fai"]
+ "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta.fai" If the above changes are surprising, stop and determine what happened. If the above changes are expected, there are two ways to fix this:
|
/fix-tests |
Bleep bloop, I am a robot. I have updated all of the failing tests for you with 25a0b8e. You must review my work before merging this pull request! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Looks good to me. @yashpatel6 for the final approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Description
This PR performs a handful of re-organizational and cleanup tasks for the pipeline.
Sidecar file searching during config
I've added the following "new" required parameters to
default.config
andschema.yaml
. They all have default values defined by other parameters, e.g.reference_fasta_fai = "${-> params.reference_fasta}.fai"
.reference_fasta_fai
reference_fasta_dict
bundle_known_indels_vcf_gz_tbi
bundle_contest_hapmap_3p3_vcf_gz_tbi
bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi
bundle_v0_dbsnp138_vcf_gz_tbi
I say "new" parameters because the values are already in use across the pipeline - this change is to eagerly perform the discovery and validation of those files during the configuration phase. That's more DRY and ensures that missing files cause fast failures.
Optional parameters listed in README
I've updated the README to include all of the parameters from
default.config
in a second table.PipeVal in the critical path
I've tweaked the early workflow processes to take the PipeVal validated output files rather than the raw BAM/BAI files (required uclahs-cds/pipeline-Nextflow-module#44). That trades a little efficiency for simplicity - the downstream processing can't begin in parallel with validation, but in exchange we don't need to worry about cancelling processes or clawing back invalid outputs due to a late validation result.
Input name harmonization
Many processes have proxy inputs for parameters - that is, those processes are always called with the same parameter as the same positional input. In those cases I renamed the input to match the parameter - for example,
run_SplitIntervals_GATK
'sreference
,reference_index
, andreference_dict
inputs are nowreference_fasta
,reference_fasta_index
, andreference_fasta_dict
.Harmonizing the input name with the parameter name makes it easier to trace the logic and search the codebase.
Testing Results
/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/nwiltsie-refactor/log-nftest-20240627T155900Z.log
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.