-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201
Comments
For the Y chromosome, there is no standardized output for the genotype:
|
RCAWhen starting a SV query, the variants from the database are written out to a temporary vcf file to be then submitted to the varfish-server-worker. However with some variants the genotype dictionary for an individual can be empty. If that happens to an CNV entry, the conversion fails. Per default, expected fields that are not in the genotype default to varfish-server/backend/svs/models/jobs.py Lines 415 to 417 in 3533238
varfish-server/backend/svs/models/jobs.py Lines 331 to 333 in 3533238
|
SolutionAllow coerce_db_to_vcf = {
"cn": lambda x: x if x == "." else int(x),
} |
Describe the bug
In a case/pedigree with multiple samples some variants may not have usable data from all samples (either due to missing coverage or also on the Y chromosome). In vcf files this can either be encoded as a single "." for that sample in the sample/genotype block or with individual missing values for each defined Format field. In the tsv file used for data-import to Varfish the sample specific data (genotype column) can - in principle - also be empty for single samples (i.e. """sample_2""": {} ). However, in cases like the the variant filtration for this sample will fail for the whole set of variants (SNVs or SVs).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Given that some samples may not have any usable information for some variants, ideally the variant filtration should be able to deal with missing data.
Alternatively, import of variants with missing data for even a single sample should be reject, so that filtration will not fail due to this.
Additional context
This could be fixed by never writing empty sample/genotype-data into the tsv files used for varfish import, see mehari issue 672
The text was updated successfully, but these errors were encountered: