Skip to content

Ambiguous path production #15

@bricoletc

Description

@bricoletc

I produced nested PRGs (--min_match_length=7, --max_nesting=5) from multiple sequence alignments of plasmodium genes and found that make_prg can produce ambiguous PRGs.

Here's an example I drew out by hand to understand

o

_o and _c mean site opening and closing nodes, with a numbered ID before. 7T means 7 consecutive Ts.

The path CA9T can be produced going through either sites 33/34/35 or 33/36. This can be horrible for genotyping as if the dataset has that sequence then you have to either say you can't genotype because of ambiguity, or randomly choose one of the two possibilities. The latter could cause comparison issues across datasets. For now in gramtools I've decided to bail out in these cases and null genotype, with a FILTER flag to AMBIG to signal this occurred.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions