-
Notifications
You must be signed in to change notification settings - Fork 15
Deal with multiple contigs and sequence lengths #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #964 +/- ##
==========================================
+ Coverage 93.31% 93.38% +0.07%
==========================================
Files 18 18
Lines 6460 6475 +15
Branches 1096 1101 +5
==========================================
+ Hits 6028 6047 +19
+ Misses 294 291 -3
+ Partials 138 137 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
ba629de
to
dbdae8b
Compare
Also fixes #763 I think |
aa733d5
to
8742ee8
Compare
Thanks @hyanwong - I've been some thinking on API design here and am thinking a simpler solution is to error out if there is more than one contig in the used variants. |
Sorry, brief reply while away, but this came from user feedback in tsinfer: people want to be able to take a vcf with multiple chromosomes in it (which is the common input format), select a chromosome, and run inference without too much extra thought (e.g. they are probably not the original generators of the vcf, and may not want to have to parse the vcf themselves). I assume this would include getting the seq len from the vcf if that is specified? But whatever makes it least error-prone for a naive user to run tsinfer on an externally provided vcf file, I guess? |
I think checking contig length if present and erroring out early if it is problematic seems fine? |
I think this scheme enables that - you |
Having the option to override the |
How does this look @hyanwong |
LGTM. Fine to merge, I reckon. |
(p.s. #765 suggests that the error message which currently reads
Should have a code snippet to show users how to do this? |
3d38a27
to
e52da3d
Compare
Good idea, added |
c2e0583
to
ff00f6b
Compare
@jeromekelleher This is now ready for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, spotted one minor issue
): | ||
self._sequence_length = sequence_length | ||
self._contig_index = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to define self._contig_id
here also, or will get attribute error when not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, and I've added checking this to the tests.
ff00f6b
to
d5dd3f4
Compare
Introduces a
contig_id
parameter to variant_data, which adds non matching contigs to the site mask, as described in #949. Fixes #949