-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I produced nested PRGs (--min_match_length=7, --max_nesting=5) from multiple sequence alignments of plasmodium genes and found that make_prg can produce ambiguous PRGs.
Here's an example I drew out by hand to understand
_o and _c mean site opening and closing nodes, with a numbered ID before. 7T means 7 consecutive Ts.
The path CA9T can be produced going through either sites 33/34/35 or 33/36. This can be horrible for genotyping as if the dataset has that sequence then you have to either say you can't genotype because of ambiguity, or randomly choose one of the two possibilities. The latter could cause comparison issues across datasets. For now in gramtools I've decided to bail out in these cases and null genotype, with a FILTER flag to AMBIG to signal this occurred.
