Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include submissionId in displayName #3652

Open
theosanderson opened this issue Feb 10, 2025 · 7 comments
Open

Include submissionId in displayName #3652

theosanderson opened this issue Feb 10, 2025 · 7 comments
Labels
discussion Open questions

Comments

@theosanderson
Copy link
Member

The definitive sample ID (a la Hu-1 for SARS-CoV-2) for the recent ebola genome is CL292200, which was used as the submissionId.

Image

IMO we should include it in the displayName, which we don't currently:

Image

I expect this to apply in many future instances.

@theosanderson theosanderson added the discussion Open questions label Feb 10, 2025
@theosanderson theosanderson changed the title Include submissionId in display name Include submissionId in displayName Feb 10, 2025
@chaoran-chen
Copy link
Member

chaoran-chen commented Feb 10, 2025

But in case Pathoplexus, for CCHF (example), the submisison IDs of INSDC-ingested sequences are really long. For Influenza on GenSpectrum (example), that's even worse.

@theosanderson
Copy link
Member Author

👍 one could treat insdc_ingest_user specially here and exclude the submission ID (it's not a real submission ID from a user).

@theosanderson
Copy link
Member Author

(For GenSpectrum of course you could make any choice you want about what to include, which is already configurable)

@chaoran-chen
Copy link
Member

That could be one option. It will probably work well in many cases but not in all. We have in Pathoplexus https://pathoplexus.org/seq/PP_0013N8P.1 with a Submission ID "Monkeypox/PT0836/2025". If we do this, we may want to adapt our Pathoplexus submission documentation to make users aware of how the submission ID is going to be used.

@theosanderson
Copy link
Member Author

👍 there could be a general case for stripping /s as part of this (converting them to - or _) to maintain the delimiter. And yes, definitely not against documenting whatever we decide on!

@fhennig
Copy link
Contributor

fhennig commented Feb 24, 2025

Maybe the display name format could be configurable on a per-organism basis?

i.e.

displayNameFormat: "${sampleId}/${accession}/${version}/${collectionCountry}"

The display name is mostly useful for website navigation etc, and I think depending on how one uses Loculus, different things might be useful to have in the display name (maybe one might opt to only use the sample ID and not even care about Loculus accessions or anything else!).

What exactly is done for PPX can then be a separate discussion

@theosanderson
Copy link
Member Author

theosanderson commented Feb 24, 2025

Maybe the display name format could be configurable on a per-organism basis?

It already is - it's made by the preprocessing pipeline subject to configuration in the YAML.

I still think this is a useful place to have the discussion as, for example, it does impact on this Q about potential specific behaviour for the insdc_ingest_user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open questions
Projects
None yet
Development

No branches or pull requests

3 participants