Annotate sequences for upload to ENA and for download by user #3739

rneher · 2025-02-24T13:26:05Z

We translate genes and report amino acid mutations, but we don't relay the coordinates of the features to ENA (NCBI wouldn't allow that). Hence sequences we submit look "draft-y" on genbank. Since we have this information, we should make this information available to INSDC. We could also allow users to download annotated genomes, e.g. via a GFF3 or genbank file.

theosanderson · 2025-02-26T13:58:10Z

(all agreed with the general idea of this issue)

Since we have this information

I'm not sure to what extent we currently have this information stored in a format useful for INSDC, or indeed for potentially displaying it ourselves (e.g. with Gensplore which is available as a react component). Below I start thinking about this:

As I understand it what we store is:

Unaligned genome
Nucleotide sequences hard-aligned to reference sequence
Amino acids hard-aligned to reference sequence
Lists of insertions and deletions, both for amino acids and nucleotides
We also have, in the Nextclade dataset, the coordinates for feature locations in the reference genome

What we need for INSDC is:

an unaligned genome (which we have)
a list of coordinates for various features in that genome. We don't have this to hand. But I guess we can maybe compute it by applying the list of nucleotide insertions and deletions to the coordinates in the reference genome?
likely to calculate unaligned amino acid translations, based on these coordinates.

anna-parker added the deposition related to ENA/INSDC deposition label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate sequences for upload to ENA and for download by user #3739

Annotate sequences for upload to ENA and for download by user #3739

rneher commented Feb 24, 2025

theosanderson commented Feb 26, 2025 •

edited

Loading

Annotate sequences for upload to ENA and for download by user #3739

Annotate sequences for upload to ENA and for download by user #3739

Comments

rneher commented Feb 24, 2025

theosanderson commented Feb 26, 2025 • edited Loading

theosanderson commented Feb 26, 2025 •

edited

Loading