Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Nicolai-vKuegelgen · 2025-01-21T15:15:22Z

Describe the bug
In a case/pedigree with multiple samples some variants may not have usable data from all samples (either due to missing coverage or also on the Y chromosome). In vcf files this can either be encoded as a single "." for that sample in the sample/genotype block or with individual missing values for each defined Format field. In the tsv file used for data-import to Varfish the sample specific data (genotype column) can - in principle - also be empty for single samples (i.e. """sample_2""": {} ). However, in cases like the the variant filtration for this sample will fail for the whole set of variants (SNVs or SVs).

To Reproduce
Steps to reproduce the behavior:

Generate a tsv file with empty genotype data for a single variant in a single sample (i.e. using mehari on a vcf with a "." in the sample block).
Import this case to Varfish
Attempt variant filtration
See error

Expected behavior
Given that some samples may not have any usable information for some variants, ideally the variant filtration should be able to deal with missing data.
Alternatively, import of variants with missing data for even a single sample should be reject, so that filtration will not fail due to this.

Additional context
This could be fixed by never writing empty sample/genotype-data into the tsv files used for varfish import, see mehari issue 672

stolpeo · 2025-01-23T10:58:22Z

For the Y chromosome, there is no standardized output for the genotype:

Dragen outputs GT as ./. and other data as .
GATK outputs GT mostly as ./. (except it is reported noise), and other data as 0
Varfish annotator converts the . of other data to 0
mehari converts the . of other data to -1

stolpeo · 2025-02-11T10:18:44Z

RCA

When starting a SV query, the variants from the database are written out to a temporary vcf file to be then submitted to the varfish-server-worker. However with some variants the genotype dictionary for an individual can be empty. If that happens to an CNV entry, the conversion fails. Per default, expected fields that are not in the genotype default to ., however, the cn entry is cast with an int() which fails the conversion.

varfish-server/backend/svs/models/jobs.py

Lines 415 to 417 in 3533238

    
           coerce_db_to_vcf.get(key, lambda x: x)( 
        
               record.genotype[sample].get(key, ".") 
        
           )

varfish-server/backend/svs/models/jobs.py

Lines 331 to 333 in 3533238

    
           coerce_db_to_vcf = { 
        
               "cn": lambda x: int(x), 
        
           }

stolpeo · 2025-02-11T10:19:35Z

Solution

Allow . as cn value.

        coerce_db_to_vcf = {
            "cn": lambda x: x if x == "." else int(x),
        }

…2235)

Nicolai-vKuegelgen added the bug Something isn't working label Jan 21, 2025

github-project-automation bot added this to Release Planning Jan 21, 2025

Nicolai-vKuegelgen mentioned this issue Jan 21, 2025

tsv output can contain empty sample/genotype-data which causes bugs in varfish varfish-org/mehari#672

Open

stolpeo added a commit that referenced this issue Feb 11, 2025

fix: empty genotype data in sample cause sv query to crash (#2201)

051bef2

stolpeo linked a pull request Feb 11, 2025 that will close this issue

fix: empty genotype data in sample cause sv query to crash (#2201) #2235

Merged

stolpeo added a commit that referenced this issue Feb 11, 2025

fix: empty genotype data in sample cause sv query to crash (#2201) (#…

56f1b4b

…2235)

stolpeo closed this as completed in #2235 Feb 11, 2025

github-project-automation bot moved this to Done in Release Planning Feb 11, 2025

varfish-bot mentioned this issue Feb 11, 2025

chore(main): release 0.1.0 #1348

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Nicolai-vKuegelgen commented Jan 21, 2025 •

edited

Loading

stolpeo commented Jan 23, 2025

stolpeo commented Feb 11, 2025

stolpeo commented Feb 11, 2025

Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Empty sample/genotype-data in single samples of a pedigree cause filters to crash #2201

Comments

Nicolai-vKuegelgen commented Jan 21, 2025 • edited Loading

stolpeo commented Jan 23, 2025

stolpeo commented Feb 11, 2025

RCA

stolpeo commented Feb 11, 2025

Solution

Nicolai-vKuegelgen commented Jan 21, 2025 •

edited

Loading