-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better formatted error message for ancestral alleles #887
Conversation
5e282e4
to
7c47a9a
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #887 +/- ##
=======================================
Coverage 87.04% 87.04%
=======================================
Files 5 5
Lines 1767 1767
Branches 310 310
=======================================
Hits 1538 1538
Misses 140 140
Partials 89 89
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
af170e1
to
fd50a6f
Compare
tsinfer/formats.py
Outdated
"The following alleles were not found in the variant_allele array " | ||
"and will be treated as unknown:\n" | ||
f"{unknown_alleles}" | ||
"Ancestral alleles not found in the variant_allele " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What'll happen here if there's thousands of them? The warning may take very long to generate, and may not be helpful. I think just keep it to the overall summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary is the same order of length, right? As in it was, {'aaa': 3, 'bbb':4}
, and would now be 'aaa': 3 (x%)\n'bbb': 4 (y%)
.
I'm suggesting this because I would find it useful to know (a) the proportion of bad ancestral alleles and (b) if I accidentally wasn't e.g. converting them to uppercase etc in the right way. I think reporting this would help avoid lots of accidentally bad analysis, which is why we introduced it (see #870 and #683 (comment). But happy to be guided by user experience.
I've been testing on the default 1000 genomes VCFs, and seeing what would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't understand the code.
Can you do the computations outside of the string then? This is unnecessarily terse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to move the calculations above the string construction. Thanks for the feedback @jeromekelleher
fd50a6f
to
30cb691
Compare
30cb691
to
bb9570b
Compare
This is pretty useful information, so quoting percentages is helpful to debug issues, I think.