Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple influenza A H5N1 dataset with all segments #217

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
\.env
\.vscode
/docs/tmp/
\.venv
3 changes: 3 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/HA/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release
26 changes: 26 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/HA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# H5N1 (segment4/ HA) - dataset with A/Goose/Guangdong/1/96 reference

| attribute | value |
| ------------------- | ---------------------------------------- |
| dataset name | community/genspectrum/iav/h5n1/GG1996/HA |
| reference strain | A/Goose/Guangdong/1/96(H5N1) |
| reference accession | NC_007362.1 |
| assembly accession | GCF_000864105.1 |

## Authors and contacts

Maintained by Genspectrum, Chaoran Chen and Anna Parker

With the help of: Cornelius Roemer and Richard Neher

## Scope of this dataset

This dataset uses the first highly-pathogenic avian influenza (HPAI) isolate (A/Goose/Guangdong/1/96) as a reference and is suitable for the analysis of circulating and historical H5 sequences, including low-pathogenicity avian influenza (LPAI) isolates.

## Features

This simple dataset only supports alignment.

## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
38 changes: 38 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/HA/examples.fasta

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
. . CDS 22 1728 . + . gene="HA"
19 changes: 19 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/HA/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"minSeedCover": 0.01
},
"attributes": {
"name": "Influenza A/H5N1 (segment 4/HA)",
"reference name": "Influenza A virus (A/goose/Guangdong/1/1996(H5N1)) hemagglutinin (HA) gene, complete cds",
"reference accession": "NC_007362.1"
},
"files": {
"reference": "reference.fasta",
"pathogenJson": "pathogen.json",
"changelog": "CHANGELOG.md",
"genomeAnnotation": "genome_annotation.gff3",
"readme": "README.md",
"examples": "examples.fasta"
}
}
27 changes: 27 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/HA/reference.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
>NC_007362.1 Influenza A virus (A/goose/Guangdong/1/1996(H5N1)) hemagglutinin (HA) gene, complete cds
GCAGGGGTATAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTCAAAAGTG
ATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAACGT
TACTGTTACACATGCCCAAGACATACTGGAAAAGACACACAATGGGAAGCTCTGCGATCTAAATGGAGTG
AAGCCTCTCATTTTGAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGAAACCCTATGTGTGACGAATTCA
TCAATGTGCCGGAATGGTCTTACATAGTGGAGAAGGCCAGTCCAGCCAATGACCTCTGTTACCCAGGGGA
TTTCAACGACTATGAAGAACTGAAACACCTATTGAGCAGAACAAACCATTTTGAGAAAATTCAGATCATC
CCCAAAAGTTCTTGGTCCAATCATGATGCCTCATCAGGGGTGAGCTCAGCATGTCCATACCATGGGAGGT
CCTCCTTTTTCAGAAATGTGGTATGGCTTATCAAAAAGAACAGTGCATACCCAACAATAAAGAGGAGCTA
CAATAATACCAACCAAGAAGATCTTTTAGTACTGTGGGGGATTCACCATCCTAATGATGCGGCAGAGCAG
ACAAAGCTCTATCAAAACCCAACCACTTACATTTCCGTTGGAACATCAACACTGAACCAGAGATTGGTTC
CAGAAATAGCTACTAGACCCAAAGTAAACGGGCAAAGTGGAAGAATGGAGTTCTTCTGGACAATTTTAAA
GCCGAATGATGCCATCAATTTCGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCATACAAAATTGTC
AAGAAAGGGGACTCAGCAATTATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGTCAAACTC
CAATGGGGGCGATAAACTCTAGTATGCCATTCCACAACATACACCCCCTCACCATCGGGGAATGCCCCAA
ATATGTGAAATCAAACAGATTAGTCCTTGCGACTGGACTCAGAAATACCCCTCAGAGAGAGAGAAGAAGA
AAAAAGAGAGGACTATTTGGAGCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATGGTAGATGGTT
GGTATGGGTACCACCATAGCAATGAGCAGGGGAGTGGATACGCTGCAGACAAAGAATCCACTCAAAAGGC
AATAGATGGAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGA
AGGGAATTTAATAACTTGGAAAGGAGGATAGAGAATTTAAACAAGCAGATGGAAGACGGATTCCTAGATG
TCTGGACTTATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAA
TGTCAAGAACCTTTATGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAATGGTTGT
TTCGAGTTCTATCACAAATGTGATAATGAATGTATGGAAAGTGTAAAAAACGGAACGTATGACTACCCGC
AGTATTCAGAAGAAGCAAGACTAAACAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATGGGAACTTA
CCAAATACTGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCT
TTATGGATGTGCTCCAATGGATCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGATTGTAGT
TAAAAACACC
3 changes: 3 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/M/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release
26 changes: 26 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/M/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# H5N1 (segment7/ M) - dataset with A/Goose/Guangdong/1/96 reference

| attribute | value |
| ------------------- | --------------------------------------- |
| dataset name | community/genspectrum/iav/h5n1/GG1996/M |
| reference strain | A/Goose/Guangdong/1/96(H5N1) |
| reference accession | NC_007363.1 |
| assembly accession | GCF_000864105.1 |

## Authors and contacts

Maintained by Genspectrum, Chaoran Chen and Anna Parker

With the help of: Cornelius Roemer and Richard Neher

## Scope of this dataset

This dataset uses the first highly-pathogenic avian influenza (HPAI) isolate (A/Goose/Guangdong/1/96) as a reference and is suitable for the analysis of circulating and historical H5 sequences, including low-pathogenicity avian influenza (LPAI) isolates.

## Features

This simple dataset only supports alignment.

## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
38 changes: 38 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/M/examples.fasta

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
. . CDS 26 784 . + . gene="M1"
. . gene 26 1007 . + . gene=M2;ID=gene-M2
. . CDS 26 51 . + . gene=M2;ID=cds-M2;Parent=gene-M2
. . CDS 740 1007 . + . gene=M2;ID=cds-M2;Parent=gene-M2
19 changes: 19 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/M/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"minSeedCover": 0.01
},
"attributes": {
"name": "Influenza A/H5N1 (segment 7/M)",
"reference name": "Influenza A virus (A/goose/Guangdong/1/1996(H5N1)) segment 7, complete sequence",
"reference accession": "NC_007363.1"
},
"files": {
"reference": "reference.fasta",
"pathogenJson": "pathogen.json",
"changelog": "CHANGELOG.md",
"genomeAnnotation": "genome_annotation.gff3",
"readme": "README.md",
"examples": "examples.fasta"
}
}
16 changes: 16 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/M/reference.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
>NC_007363.1 Influenza A virus (A/goose/Guangdong/1/1996(H5N1)) segment 7, complete sequence
AGCAAAAGCAGGTAGATATTGAAAAATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTCTCTATCGTC
CCGTCAGGCCCCCTCAAAGCCGAGATCGCGCAGAGACTTGAGGATGTCTTTGCAGGAAAGAACACCGATC
TCGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCCTGTCACCTCTGACTAAAGGGATTTTAGGATT
TGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGTAGACGCTTTGTCCAGAATGCCTTAAAT
GGAAATGGAGATCCAAACAATATGGATAGGGCAGTTAAGCTATACAAGAAGCTGAAAAGAGAAATAACAT
TCCATGGGGCTAAGGAGGTCGCACTCAGCTACTCAACCGGTGCACTTGCCAGTTGTATGGGTCTCATATA
CAACAGGATGGGAACGGTGACCACAGAAGTGGCTTTTGGCCTAGTGTGTGCCACTTGTGAGCAGATTGCA
GATTCACAGCATCGGTCTCACAGACAGATGGCAACTACCACCAACCCACTAATCAGGCATGAGAACAGAA
TGGTGCTGGCCAGCACTACAGCTAAGGCTATGGAGCAGATGGCTGGATCGAGTGAGCAGGCAGCGGAAGC
CATGGAGGTTGCTAGTCAGGCTAGGCAGATGGTGCAGGCAATGAGGACAATTGGGACTCATCCTAGCTCC
AGTGCCGGTCTGAAAGATAATCTTCTTGAAAATTTGCAGGCCTACCAAAAACGAATGGGAGTGCAAATGC
AGCGATTCAAGTGATCCTCTTGTTGTTGCCGCAAGTATCATTGGGATACTGCACTTGATATTGTGGATTC
TTGATCGTCTTTTCTTCAAATGCATTTATCGTCGCCTTAAATACGGTTTGAAAAGAGGGCCTTCTACGGA
AGGGGTACCTGAGTCTATGAGGGAAGAGTATCGGCAGGAACAGCAGAGTGCTGTGGATGTTGACGATGGT
CATTTTGTCAACATAGAGCTGGAGTAAAAAACTACCTTGTTTCTACT
3 changes: 3 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/NA/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release
26 changes: 26 additions & 0 deletions data/community/genspectrum/iav/h5n1/GG1996/NA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# H5N1 (segment6/ NA) - dataset with A/Goose/Guangdong/1/96 reference

| attribute | value |
| ------------------- | ---------------------------------------- |
| dataset name | community/genspectrum/iav/h5n1/GG1996/NA |
| reference strain | A/Goose/Guangdong/1/96(H5N1) |
| reference accession | NC_007361.1 |
| assembly accession | GCF_000864105.1 |

## Authors and contacts

Maintained by Genspectrum, Chaoran Chen and Anna Parker

With the help of: Cornelius Roemer and Richard Neher

## Scope of this dataset

This dataset uses the first highly-pathogenic avian influenza (HPAI) isolate (A/Goose/Guangdong/1/96) as a reference and is suitable for the analysis of circulating and historical H5 sequences, including low-pathogenicity avian influenza (LPAI) isolates.

## Features

This simple dataset only supports alignment.

## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Loading