Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Open
Nicolai-vKuegelgen opened this issue Jan 21, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Nicolai-vKuegelgen
Copy link

Nicolai-vKuegelgen commented Jan 21, 2025

Describe the bug
In a case/pedigree with multiple samples some variants may not have usable data from all samples (either due to missing coverage or also on the Y chromosome). In vcf files this can either be encoded as a single "." for that sample in the sample/genotype block or with individual missing values for each defined Format field. In the tsv file used for data-import to Varfish the sample specific data (genotype column) can - in principle - also be empty for single samples (i.e. """sample_2""": {} ). However, in cases like the the variant filtration for this sample will fail for the whole set of variants (SNVs or SVs).

To Reproduce
Steps to reproduce the behavior:

  1. Generate a tsv file with empty genotype data for a single variant in a single sample (i.e. using mehari on a vcf with a "." in the sample block).
  2. Import this case to Varfish
  3. Attempt variant filtration
  4. See error

Expected behavior
Given that some samples may not have any usable information for some variants, ideally the variant filtration should be able to deal with missing data.
Alternatively, import of variants with missing data for even a single sample should be reject, so that filtration will not fail due to this.

Additional context
This could be fixed by never writing empty sample/genotype-data into the tsv files used for varfish import, see mehari issue 672

@stolpeo
Copy link
Contributor

stolpeo commented Jan 23, 2025

For the Y chromosome, there is no standardized output for the genotype:

  • Dragen outputs GT as ./. and other data as .
  • GATK outputs GT mostly as ./. (except it is reported noise), and other data as 0
  • Varfish annotator converts the . of other data to 0
  • mehari converts the . of other data to -1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants