Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer error message when sequences submitted without segment identifiers for segmented virus #3683

Open
theosanderson opened this issue Feb 13, 2025 · 3 comments · May be fixed by #3769
Open
Labels
backend related to the loculus backend component high_priority Work on this as soon as possible (potentially post-MVP)

Comments

@theosanderson
Copy link
Member

theosanderson commented Feb 13, 2025

Apparently one gets an error message like this:

Image

One should get a message like. Error: some sequence headers do not contain segment identifiers. Sequence headers should end with a segment identifier, e.g. "mysample_S", "mysample_L", etc., whereas you entered:

https://loculus.slack.com/archives/C05G172HL6L/p1739441563468129

@theosanderson theosanderson added the high_priority Work on this as soon as possible (potentially post-MVP) label Feb 14, 2025
@chaoran-chen chaoran-chen added the backend related to the loculus backend component label Feb 14, 2025
@fhennig
Copy link
Contributor

fhennig commented Feb 25, 2025

Just spotted this as 'Ready to work' but I'm not super sure how I'd do this, @anna-parker maybe you can write down some pointers or even take it over onto your list instead? 😅

For someone more familiar with the pipeline codebase it's probably a 2 minute job (to quote @corneliusroemer 😁 )

EDIT: Nevermind, it's in the backend of course (thanks Anna for pointing it out to me!)

@fhennig fhennig linked a pull request Feb 26, 2025 that will close this issue
2 tasks
@fhennig
Copy link
Contributor

fhennig commented Feb 26, 2025

Ok, so, I have some updates here.

Clearer error message when sequences submitted without segment identifiers for segmented virus

Actually, the submission IDs in the screenshot do contain a segment identifier, because whatever comes after the last underscore becomes the segment identifier. so test_NIH19 becomes isolate test with segment NIH19.

I thought "well, let's make sure that the segment is actually one of the configured segments, and I did that in #3769 and that works! But now Chaoran told me that it is actually by design that we do not enforce the segment names, because the pipeline is supposed to handle this. Analogous to how we allow submitting arbitrary metadata fields and let the pipeline handle it.

So, actually, the error message is correct and "Error: some sequence headers do not contain segment identifiers" wouldn't be right, all the submission IDs do in fact contain a segment name.

I don't see an obvious way to fix this, so I thought I'd ask for input again!

@theosanderson @chaoran-chen @anna-parker

@theosanderson
Copy link
Member Author

theosanderson commented Feb 26, 2025

Hmm thanks!

I don't have a problem in principle with the idea of the pipeline handling it - and in general I like the idea of giving the pipeline lots of flexibility. It's just that here the backend does do logic that assumes that these are formatted as id_segment things - and here there is a problem (for the backend) because they are not formatted like that, and it would be nice for us to give a more useful error.

The question would be what the use-case that is prompting us to give the pipeline flexibility to do this.

One idea would be that we would want to support bacterial contigs (#3539) and they might be: sample1_1, sample1_2 to any arbitrary number. But atm we don't support those, and once we do we would have a specific organism configuration for them which would allow us to turn off this check. I don't think I atm see a compelling reason not to move this logic into the backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend related to the loculus backend component high_priority Work on this as soon as possible (potentially post-MVP)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants