Skip to content

Commit

Permalink
Merge pull request #143 from nextstrain/data/update-rsv
Browse files Browse the repository at this point in the history
  • Loading branch information
ivan-aksamentov authored Jan 29, 2024
2 parents f37f2da + 8e95f55 commit a822167
Show file tree
Hide file tree
Showing 25 changed files with 8,130 additions and 5,055 deletions.
5 changes: 5 additions & 0 deletions data/nextstrain/rsv/a/EPI_ISL_412866/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- fix definitions of G_clades (legacy) for RSV-A and RSV-B


## 2024-01-16T20:31:02Z

**first release of v3 dataset.**
Expand Down
1 change: 1 addition & 0 deletions data/nextstrain/rsv/a/EPI_ISL_412866/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ The reference tree covers the diversity of RSV-A since the first sequenced sampl
## Nomenclature
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion.
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A).
Legacy clade definitions for the nomenclature defined by Goya et al (`G_clade`) are included for orientation. These clade definitions will not be updated and are incomplete. We encourage users to use the new consortium nomenclature.
12 changes: 6 additions & 6 deletions data/nextstrain/rsv/a/EPI_ISL_412866/pathogen.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,6 @@
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"shortcuts": [
"rsv_a",
"nextstrain/rsv/a",
"nextstrain/rsv/a/hRSV-A-England-397-2017"
],
"qc": {
"privateMutations": {
"enabled": true,
Expand Down Expand Up @@ -84,9 +79,14 @@
"Nextstrain team <https://nextstrain.org>"
]
},
"shortcuts": [
"rsv_a",
"nextstrain/rsv/a",
"nextstrain/rsv/a/hRSV-A-England-397-2017"
],
"attributes": {
"name": "RSV-A",
"reference accession": "EPI_ISL_412866",
"reference name": "hRSV/A/England/397/2017"
}
}
}
3,451 changes: 1,734 additions & 1,717 deletions data/nextstrain/rsv/a/EPI_ISL_412866/sequences.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/nextstrain/rsv/a/EPI_ISL_412866/tree.json

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix definitions of G_clades (legacy) for RSV-A and RSV-B

## 2024-01-16T20:31:02Z

**first release of v3 dataset.**
Expand Down
1 change: 1 addition & 0 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ The reference tree covers the diversity of RSV-B since the first sequenced sampl
## Nomenclature
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion.
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-B](https://github.com/rsv-lineages/lineage-designation-B).
Legacy clade definitions for the nomenclature defined by Goya et al (`G_clade`) are included for orientation. These clade definitions will not be updated and are incomplete. We encourage users to use the new consortium nomenclature.
12 changes: 6 additions & 6 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/pathogen.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,6 @@
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"shortcuts": [
"rsv_b",
"nextstrain/rsv/b",
"nextstrain/rsv/b/hRSV-B-Australia-VIC-RCH056-2019"
],
"qc": {
"privateMutations": {
"enabled": true,
Expand Down Expand Up @@ -84,9 +79,14 @@
"Nextstrain team <https://nextstrain.org>"
]
},
"shortcuts": [
"rsv_b",
"nextstrain/rsv/b",
"nextstrain/rsv/b/hRSV-B-Australia-VIC-RCH056-2019"
],
"attributes": {
"name": "RSV-B",
"reference accession": "EPI_ISL_1653999",
"reference name": "hRSV/B/Australia/VIC-RCH056/2019"
}
}
}
5,387 changes: 2,067 additions & 3,320 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/sequences.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/nextstrain/rsv/b/EPI_ISL_1653999/tree.json

Large diffs are not rendered by default.

20 changes: 16 additions & 4 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -1112,6 +1112,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
Expand All @@ -1122,8 +1129,7 @@
}
],
"version": {
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1182,6 +1188,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
Expand All @@ -1192,8 +1205,7 @@
}
],
"version": {
"updatedAt": "2024-01-16T20:31:02Z",
"tag": "2024-01-16--20-31-02Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
## Unreleased

- fix definitions of G_clades (legacy) for RSV-A and RSV-B


## 2024-01-16T20:31:02Z

**first release of v3 dataset.**

Updated consortium nomenclature.
21 changes: 21 additions & 0 deletions data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# RSV-A dataset with reference genome A/England/397/2017

| Key | Value |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------|
| authors | [Richard Neher](https://neherlab.org), Laura Urbanska, [Nextstrain](https://nextstrain.org) |
| data source | Genbank + authorized other sequences |
| workflow | [github.com/nextstrain/rsv/nextclade](https://github.com/nextstrain/rsv/nextclade) |
| nextclade dataset path | nextstrain/rsv/a/EPI_ISL_412866 |
| reference | EPI_ISL_412866 |
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) |

## Scope of this dataset
This dataset for RSV-B uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737).
This sequence has the duplication in the G-protein shared by all currently circulating variants.
The reference tree covers the diversity of RSV-A since the first sequenced samples.


## Nomenclature
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion.
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A).
Legacy clade definitions for the nomenclature defined by Goya et al (`G_clade`) are included for orientation. These clade definitions will not be updated and are incomplete. We encourage users to use the new consortium nomenclature.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##gff-version 3
##sequence-region EPI_ISL_412866 1 15225
EPI_ISL_412866 annotation remark 1 15225 . . . molecule_type=cRNA;organism=Human orthopneumovirus;taxonomy=Viruses,Riboviria,Orthornavirae,Negarnaviricota,Haploviricotina,Monjiviricetes,Mononegavirales,Pneumoviridae,Orthopneumovirus,Orthopneumovirus hominis
EPI_ISL_412866 feature source 1 15225 . . . mol_type=viral cRNA;organism=Human orthopneumovirus
EPI_ISL_412866 feature 5'UTR 1 15 . . . citation=%5B1%5D;function=Leader region 5%27UTR
EPI_ISL_412866 feature CDS 70 489 . . 0 codon_start=1;db_xref=GeneID:37607636;gene=NS1;gene_name=NS1;Name=NS1;product=nonstructural protein 1;protein_id=YP_009518850.1;translation=MGSNSLSMIKVRLQNLFDNDEVALLKITCYTDKLIQLTNALAKAVIHTIKLNGIVFVHVITSSDICPNNNIVVKSNFTTMPVLQNGGYIWEMMELTHCSQPNGLIDDNCEIKFSKKLSDSTMTNYMNQLSELLGFDLNP%2A
EPI_ISL_412866 feature CDS 599 973 . . 0 codon_start=1;db_xref=GeneID:37607637;gene=NS2;gene_name=NS2;Name=NS2;product=nonstructural protein 2;protein_id=YP_009518851.1;translation=MDTTHNDTTPQRLMITDMRPLSLETIITSLTRDIITHKFIYLINHECIVRKLDERQATFTFLVNYEMKLLHKVGSTKYKKYTEYNTKYGTFPMPIFINHDGFLECIGIKPTKHTPIIYKYDLNP%2A
EPI_ISL_412866 feature CDS 1111 2286 . . 0 codon_start=1;db_xref=GeneID:37607638;gene=N;gene_name=N;Name=N;product=nucleoprotein;protein_id=YP_009518852.1;translation=MALSKVKLNDTLNKDQLLSSSKYTIQRSTGDSIDTPNYDVQKHINKLCGMLLITEDANHKFTGLIGMLYAMSRLGREDTIKILKDAGYHVKANGVDVTTHRQDINGKEMKFEVLTLASLTTEIQINIEIESRKSYKKMLKEMGEVAPEYRHDSPDCGMIILCIAALVITKLAAGDRSGLTAVIRRANNVLKNEMKRYKGLLPKDIANSFYEVFEKYPHFIDVFVHFGIAQSSTRGGSRVEGIFAGLFMNAYGAGQVMLRWGVLAKSVKNIMLGHASVQAEMEQVVEVYEYAQKLGGEAGFYHILNNPKASLLSLTQFPHFSSVVLGNAAGLGIMGEYRGTPRNQDLYDAAKVYAEQLKENGVINYSVLDLTAEELEAIKHQLNPKDNDVEL%2A
EPI_ISL_412866 feature CDS 2318 3043 . . 0 codon_start=1;db_xref=GeneID:37607639;gene=P;gene_name=P;Name=P;product=phosphoprotein;protein_id=YP_009518853.1;translation=MEKFAPEFHGEDANNRATKFLESIKGKFTSPKDPKKKDSIISVNSIDIEVTKESLITSNSTIINPINETDDTVGNKPNYQRKPLVSFKEDPTPSDNPFSKLYKETIETFDNNEEESSYSYEEINDQTNDNITARLDRIDEKLSEILGMLHTLVVASAGPTSARDGIRDAMVGLREEMIEKIRTEALMTNDRLEAMARLRNEESEKMAKDTSDEVSLNPTSEKLNNLLEGNDSDNDLSLEDF%2A
EPI_ISL_412866 feature CDS 3226 3996 . . 0 codon_start=1;db_xref=GeneID:37607640;gene=M;gene_name=M;Name=M;product=matrix protein;protein_id=YP_009518854.1;translation=METYVNKLHEGSTYTAAVQYNVLEKDDDPASLTIWVPMFQSSMPADLLIKELANVNILVKQISTPKGPSLRVMINSRSAVLAQMPSKFTICANVSLDERSKLAYDVTTPCEIKACSLTCLKSKNMLTTVKDLTMKTLNPTHDIIALCEFENIVTSKKVIIPTYLRSISVRNKDLNTLENITTTEFKNAITNAKIIPYSGLLLVITVTDNKGAFKYIKPQSQFIVDLGAYLEKESIYYVTTNWKHTATRFAIKPMED%2A
EPI_ISL_412866 feature CDS 4266 4460 . . 0 codon_start=1;db_xref=GeneID:37607641;gene=SH;gene_name=SH;Name=SH;product=small hydrophobic protein;protein_id=YP_009518855.1;translation=MENTSITIEFSSKFWPYFTLIHMITTIISLIIIISIMIAILNKLCEYNVFHNKTFELPRARVNT%2A
EPI_ISL_412866 feature CDS 4652 5617 . . 0 codon_start=1;db_xref=GeneID:37607642;gene=G;gene_name=G;Name=G;product=attachment glycoprotein;protein_id=YP_009518856.1;translation=MSKTKDQRTAKTLERTWDTLNHLLFISSCLYKLNLKSIAQITLSILAMIISTSLIIAAIIFIASANHKVTPTTAIIQDATNQIKNTTPTHLTQNPQLGISLSNLSGTTSQSTTILASTTPSAESTPQSTTVKIINTTTTQILPSKPTTKQRQNKPQNKPNNDFHFEVFNFVPCSICSNNPTCWAICKRIPNKKPGKKTTTKPTKKPTLKTTKKDPKPQTTKPKGVLTTKPTGKPTINTTKTNSRTTLLTSNTKGNPEHTSQKETIHSTTSEGYPSPSQVYTTSDQEETLHSTTSEGYPSPSQVYTTSEYLSQSLSSSNTTK%2A
EPI_ISL_412866 feature CDS 5697 7421 . . 0 codon_start=1;db_xref=GeneID:37607643;gene=F;gene_name=F;Name=F;product=fusion glycoprotein;protein_id=YP_009518857.1;translation=MELPILKTNAITTILAAVTLCFASSQNITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKENKCNGTDAKVKLIKQELDKYKNAVTELQLLMQSTPAANSRARRELPRFMNYTLNNTKNTNVTLSKKRKRRFLGFLLGVGSAIASGIAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSISNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSSNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNIDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDEFDASISQVNEKINQSLAFIRKSDELLHNVNAGKSTTNIMITTIIIVIIVILLALIAVGLLLYCKARSTPVTLSKDQLSGINNIAFSN%2A
EPI_ISL_412866 feature CDS 7640 8224 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-1;Name=M2-1;note=ORF 1%2C matrix protein 2;product=M2-1 protein;protein_id=YP_009518858.1;translation=MSRRNPCKFEIRGHCLNGKRCHFSHNYFEWPPHALLVRQNFMLNRILKSMDKSIDTLSEISGAAELDRTEEYALGVVGVLESYIGSINNITKQSACVAMSKLLTELNSDDIKKLRDNEEPNSPKVRVYNTVISYIESNRKNNKQTIHLLKRLPADVLKKTIKNTLDIHKSITINNSKESTVSDTNDHAKNNDTT%2A
EPI_ISL_412866 feature CDS 8193 8465 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-2;Name=M2-2;note=ORF 2%2C RNA processivity factor;product=M2-2 protein;protein_id=YP_009518859.1;translation=TTMPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS%2A
EPI_ISL_412866 feature CDS 8532 15029 . . 0 codon_start=1;db_xref=GeneID:37607645;gene=L;gene_name=L;Name=L;note=RNA dependant RNA polymerase%3B RdRp;product=polymerase protein;protein_id=YP_009518860.1;translation=MDPIISGNSANVYLTDSYLKGVISFSECNALGSYIFNGPYLKNDYTNLISRQNPLIEHINLKKLNITQSLISKYHKGEIKIEEPTYFQSLLMTYKSMTSSEQTTTTNLLKKIIRRAIEISDVKVYAILNKLGLKEKDKIKSNNGQDEDNSVITTIIKDDILLAVKDNQSHPKADKNQSTKQKDTIKTTLLKKLMCSMQHPPSWLIHWFNLYTKLNSILTQYRSSEVKNHGFILIDNHTLSGFQFILNQYGCIVYHRELKRITVTTYNQFLTWKDISLSRLNVCLITWISNCLNTLNKSLGLRCGFNNVILTQLFLYGDCILKLFHNEGFYIIKEVEGFIMSLILNITEEDQFRKRFYNSMLNNITDAANKAQKNLLSRVCHTLLDKTISDNIINGRWIILLSKFLKLIKLAGDNNLNNLSELYFLFRIFGHPMVDERQAMDAVKVNCNETKFYLLSSLSMLRGAFIYRIIKGFVNNYNRWPTLRNAIVLPLRWLTYYKLNTYPSLLELTERDLIVLSGLRFYREFRLPKKVDLEMIINDKAISPPKNLIWTSFPRNYMPSHIQNYIEHEKLKFSDSDKSRRVLEYYLRDNKFNECDLYNCVVNQSYLNNPNHVVSLTGKERELSVGRMFAMQPGMFRQVQILAEKMIAENILQFFPESLTRYGDLELQKILELKAGISNKSNRYNDNYNNYISKCSIITDLSKFNQAFRYETSCICSDVLDELHGVQSLFSWLHLTIPHVTIICTYRHAPPYIKDHIVDLNNVDEQSGLYRYHMGGIEGWCQKLWTIEAISLLDLISLKGKFSITALINGDNQSIDISKPVRLMEGQTHAQADYLLALNSLKLLYKEYAGIGHKLKGTETYISRDMQFMSKTIQHNGVYYPASIKKVLRVGPWINTILDDFKVSLESIGSLTQELEYRGESLLCSLIFRNVWLYNQIALQLKNHALCNNKLYLDILKVLKHLKTFFNLDNIDTALTLYMNLPMLFGGGDPNLLYRSFYRRTPDFLTEAIVHSVFILSYYTNHDLKDKLQDLSDDRLNKFLTCIITFDKNPNAEFVTLMRDPQALGSERQAKITSEINRLAVTEVLSTAPNKIFSKSAQHYTTTEIDLNDIMQNIEPTYPHGLRVVYESLPFYKAEKIVNLISGTKSITNILEKTSAIDLTDIDRATEMMRKNITLLIRILPLDCNRDKREILSMENLSITELSKYVRERSWSLSNIVGVTSPSIMYTMDIKYTTSTIASGIIIEKYNVNSLTRGERGPTKPWVGSSTQEKKTMPVYNRQVLTKKQRDQIDLLAKLDWVYASIDNKDEFMEELSIGTLGLTYEKAKKLFPQYLSVNYLHRLTVSSRPCEFPASIPAYRTTNYHFDTSPINRILTEKYGDEDIDIVFQNCISFGLSLMSVVEQFTNVCPNRIILIPKLNEIHLMKPPIFTGDVDIHKLKLVIQKQHMFLPDKISLTQYVELFLSNKTLKSGSNVNSNLILAHKISDYFHNTYILSTNLAGHWILIIQLMKDSKGIFEKDWGEGYITDHMFINLKVFFNAYKTYLLCFHKGYGRAKLECDMNTSDLLCVLELIDSSYWKSMSKVFLEQKVIKYILSQDASLHRVKGCHSFKLWFLKRLNVAEFTVCPWVVNIDYHPTHMKAILTYIDLVRMGLINIDRIYIKNKHKFNDEFYTSNLFYINYNFSDNTHLLTKHIRIANSELESNYNKLYHPTPETLENILTNPVKNNEKKTLSGYCIGKNVDSIMLPSLSNKKLIKSSTMIRTNYSRQDLYNLFPTVVIDKIIDHSGNTAKSNQLYTTTSHQISLVHNSTSLYCMLPWHHINRFNFVFSSTGCKISIEYILKDLKIKDPNCIAFIGEGAGNLLLRTVVELHPDIRYIYRSLKDCNDHSLPIEFLRLYNGHINIDYGENLTIPATDATNNIHWSYLHIKFAEPISLFVCDAELPVTVNWSKIIIEWSKHVRKCKYCSSVNKCTLIVKYHAQDDIDFKLDNITILKTYVCLGSKLKGSEVYLVLTIGPANVFPVFNVVQNAKLILSRTKNFIMPKKADKESIDANIKSLIPFLCYPITKKGINTALSKLKSVVSGDILSYSIAGRNEVFSNKLINHKHMNILKWFNHVLNFRSTELNYNHLYMVESTYPHLSELLNSLTTNELKKLIKITGSLLYNFYNE%2A
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"excessBandwidth": 9,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "left",
"minSeedCover": 0.1
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
},
"defaultCds": "F",
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"qc": {
"privateMutations": {
"enabled": true,
"typical": 50,
"cutoff": 150,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"missingData": {
"enabled": false,
"missingDataThreshold": 2000,
"scoreBias": 500
},
"snpClusters": {
"enabled": false,
"windowSize": 100,
"clusterCutOff": 10,
"scoreWeight": 50
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 8
},
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": []
}
},
"cdsOrderPreference": [
"F",
"G",
"L"
],
"maintenance": {
"website": [
"https://nextstrain.org",
"https://clades.nextstrain.org"
],
"documentation": [
"https://github.com/nextstrain/rsv"
],
"source code": [
"https://github.com/nextstrain/rsv"
],
"issues": [
"https://github.com/nextstrain/rsv/issues"
],
"organizations": [
"Nextstrain"
],
"authors": [
"Nextstrain team <https://nextstrain.org>"
]
},
"shortcuts": [
"rsv_a",
"nextstrain/rsv/a",
"nextstrain/rsv/a/hRSV-A-England-397-2017"
],
"attributes": {
"name": "RSV-A",
"reference accession": "EPI_ISL_412866",
"reference name": "hRSV/A/England/397/2017"
}
}
Loading

0 comments on commit a822167

Please sign in to comment.