Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename input file parameters #488

Merged
merged 12 commits into from
Nov 18, 2024

Conversation

fellen31
Copy link
Collaborator

@fellen31 fellen31 commented Nov 6, 2024

Closes #324.

Tries to make the file inputs to the pipeline more descriptive and uniform, e.g. prefacing the parameter with the tool name it belongs to.

  • Rename skip_short_variant_calling to skip_snv_calling
  • Rename skip_assembly_wf to skip_genome_assembly
  • Rename skip_mapping_wf to skip_alignment
  • Rename skip_methylation_wf to skip_methylation_analysis
  • Rename skip_phasing_wf to skip_phasing
  • Rename variant_caller to snv_caller
  • Rename parallel_snv to snv_calling_processes
  • Rename cadd_prescored to cadd_prescored_indels
  • Rename snp_db to echtvar_snv_databases
  • Rename variant_catalog to stranger_repeat_catalog
  • Rename bed to target_bed
  • Rename hificnv_xy to hificnv_expected_xy_cn
  • Rename hificnv_xx to hificnv_expected_xx_cn
  • Rename hificnv_exclude to hificnv_excluded_regions
  • Rename reduced_penetrance to genmod_reduced_penetrance
  • Rename score_config_snv to genmod_score_config_snvs
  • Rename score_config_sv to genmod_score_config_svs
  • Rename parallel_alignments to alignement_processes
  • Rename svdb_dbs to svdb_sv_databases

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@fellen31 fellen31 force-pushed the rename-parameters branch 2 times, most recently from d559c7b to 75ecec7 Compare November 6, 2024 09:45
@fellen31 fellen31 marked this pull request as ready for review November 6, 2024 09:59
@fellen31 fellen31 requested a review from a team as a code owner November 6, 2024 09:59
Copy link
Collaborator

@Schmytzi Schmytzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
However, I found some spots where you missed renaming bed to target_bed and some other minor things and inconsistencies.
I've got some more thoughts:

  1. If I understood correctly, target_bed is only used for SNV calling. In that case, it should probably be called snv_calling_target_bed (or ..._regions, see one of the line comments)
  2. Maybe it makes more sense to change skip_methylation_analysis to skip_methylaytion_pileup as that is all that subworkflow does. Or do we expect it to gain more functionality later on.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
| `bed` | `target_bed` |
| `hificnv_xy` | `hificnv_expected_xy_cn` |
| `hificnv_xx` | `hificnv_expected_xx_cn` |
| `hificnv_exclude` | `hificnv_excluded_regions` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be hificnv_exluded_bed instead? The other bed param is called target_bed and not target_regions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, another parameter is called par_regions. So maybe target_bed should be target_regions instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third thought: "PAR regions" is redundant, as the "R" in "PAR" already means "region". So maybe it's better to use _bed for all of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both deepvariant and dipcall refer to them as PAR regions, so I'd prefer keeping it that way. Even though technically it's redundant (Swedish CD-skiva is another example).

But I think we could have target_regions instead of target_bed.

CHANGELOG.md Outdated Show resolved Hide resolved
| `score_config_snv` | `genmod_score_config_snvs` |
| `score_config_sv` | `genmod_score_config_svs` |
| `parallel_alignments` | `alignement_processes` |
| `svdb_dbs` | svdb_sv_databases`` |

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe variant_consequences_snv should be variant_consequences_snvs. The corresponding param is variant_consequences_svs and the similar param for genmod is genmod_score_config_snvs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good!

CHANGELOG.md Outdated
@@ -119,6 +119,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| `--validationSkipDuplicateCheck` | |
| `--validationS3PathCheck` | |
| `--monochromeLogs` | `--monochrome_logs` |
| `skip_short_variant_calling` | `skip_snv_calling` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All parameters below are missing the --

| `variant_catalog` | A variant catalog json-file for stranger | `string` | | | |
| `echtvar_snv_databases` | A csv file with echtvar databases to annotate SNVs with | `string` | | | |
| `svdb_sv_databases` | Databases used for structural variant annotation in vcf format. <details><summary>Help</summary><small>Path to comma-separated file containing information about the databases used for structural variant annotation.</small></details>| `string` | | | |
| `stranger_repeat_catalog` | A variant catalog json-file for stranger | `string` | | | |
| `variant_consequences_snv` | File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SNVs. For more information check https://ensembl.org/info/genome/variation/prediction/predicted_data.html | `string` | | | |
| `variant_consequences_svs` | File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SVs. For more information check https://ensembl.org/info/genome/variation/prediction/predicted_data.html | `string` | | | |
| `vep_cache` | A path to the VEP cache location | `string` | | | |
| `bed` | A BED file with regions of interest, used to limit short variant calling. | `string` | | | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `bed` | A BED file with regions of interest, used to limit short variant calling. | `string` | | | |
| `target_bed` | A BED file with regions of interest, used to limit short variant calling. | `string` | | | |

docs/usage.md Outdated
| Parameter | Description |
| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `genmod_score_config_svs` |  Used by GENMOD when ranking variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rank_model_snv.ini). |
| `genmod_reduced_penetrance` | A list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv) |

`--skip_rank_variants`.

## Other highlighted parameters

- Limit SNV calling to regions in BED file (`--bed`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Limit SNV calling to regions in BED file (`--bed`).
- Limit SNV calling to regions in BED file (`--target_bed`).

@Schmytzi
Copy link
Collaborator

Schmytzi commented Nov 7, 2024

One more thing: The workflow parameter for skipping short-variant calling is skip_short_variant_calling but in all other params they are referred to as SNVs, even if they also relate to indels e.g., skip_snv_annotation.

fellen31 and others added 2 commits November 7, 2024 16:19
Co-authored-by: Daniel Schmitz <[email protected]>
Co-authored-by: Daniel Schmitz <[email protected]>
@fellen31
Copy link
Collaborator Author

One more thing: The workflow parameter for skipping short-variant calling is skip_short_variant_calling but in all other params they are referred to as SNVs, even if they also relate to indels e.g., skip_snv_annotation.

Yeah. Here I wanted to rename skip_short_variant_calling to skip_snv_calling, to align with that they are called snvs in other parameters. I don't think it's ideal to call them snvs even though they contain indels as well.., but I don't know if it's more complicated to have "small variants", "snvs and indels", "snvs and small indels" in the parameters/file names? Any suggestion?

@fellen31
Copy link
Collaborator Author

Great work! However, I found some spots where you missed renaming bed to target_bed and some other minor things and inconsistencies. I've got some more thoughts:

  1. If I understood correctly, target_bed is only used for SNV calling. In that case, it should probably be called snv_calling_target_bed (or ..._regions, see one of the line comments)

Right now target_bed is used to limit SNV calling and methylation pileups, making the SNV-calling and methylation pileups quicker for targeted analysis. It's also passed to mosdepth to calculate depth per region. Ideally this bed would limit SV calling to the target regions, but now it's only filtering the results.

In Stockholm we will need to be able to output both variants from the whole genome, and filtered variants at the same time. I believe this would not be the usual use case for others, so perhaps we could have for example both a target_bed parameter and an snv_calling_target_bed. The snv_calling_target_bed is then by default set to target_bed, so a normal user get's expected results, but can also be overridden. Need to think about it some more.

  1. Maybe it makes more sense to change skip_methylation_analysis to skip_methylaytion_pileup as that is all that subworkflow does. Or do we expect it to gain more functionality later on.

Yes, although it would be more correct currently, I hope we can at least add some visualisation track outputs to the workflow fairly soon. Even so, skip_methylation_analysis feels a bit broad maybe, because there's not much of an analysis going on.

@Schmytzi
Copy link
Collaborator

Regarding your first point: I think calling everything "SNV" is preferable. Even if it isn't technically correct I think it's clear what it means and I like conciseness.
Ad 2: That sounds like a good alternative.
Ad 3: Maybe skip_methylation_summary?

@fellen31 fellen31 force-pushed the rename-parameters branch 6 times, most recently from f6644aa to ea1de63 Compare November 14, 2024 09:54
@fellen31
Copy link
Collaborator Author

Regarding your first point: I think calling everything "SNV" is preferable. Even if it isn't technically correct I think it's clear what it means and I like conciseness.

Let's go with snv then.

Ad 2: That sounds like a good alternative.

I made a separate issue for finer-grained control over regions (#503). Renaming --bed to --target_regions in this PR.

Ad 3: Maybe skip_methylation_summary?

Let's go with your suggestion of --skip_methylation_pileups, because that is what is does. If it changes, then we can change the parameter if needed.

Copy link
Collaborator

@Schmytzi Schmytzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind fixing the alignment where I marked it? 😬

nextflow.config Outdated Show resolved Hide resolved
nextflow.config Outdated Show resolved Hide resolved
nextflow.config Outdated Show resolved Hide resolved
fellen31 and others added 3 commits November 15, 2024 08:44
Co-authored-by: Daniel Schmitz <[email protected]>
Co-authored-by: Daniel Schmitz <[email protected]>
Co-authored-by: Daniel Schmitz <[email protected]>
@fellen31
Copy link
Collaborator Author

Would you mind fixing the alignment where I marked it? 😬

Thanks!

@fellen31
Copy link
Collaborator Author

Fixed!

Copy link
Collaborator

@jemten jemten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Anything else you like to add @Schmytzi?

Copy link
Collaborator

@Schmytzi Schmytzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@fellen31 fellen31 merged commit 4817e1f into genomic-medicine-sweden:dev Nov 18, 2024
23 checks passed
@fellen31 fellen31 deleted the rename-parameters branch November 18, 2024 10:18
@Schmytzi Schmytzi mentioned this pull request Dec 10, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[User Story] Consistent naming of pipeline parameters
3 participants