Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 833 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 833 Bytes

cov-2-mutations-by-lineage

Quick analysis to associate SARS-Cov-2 spike mutations with pangolin lineages using GISAID data

Results available in muts_by_lineage.csv. The first column gives mutations found in >80% of sequences for the given lineage. The other two columns use a 50% and 10% threshold.

These results were generated by doing a pairwise alignment of each entry's Spike sequence to the WIV04 reference sequence, identifing mutations, and grouping them by the pangolin lineage annotations that GISAID provides in the metadata file.

To regenerate this analysis, you will need the Spike protein sequences from GISAID, as well as the GISAID metadata file. The analysis is in the analysis.ipynb notebook.

If you know of a better source for this information please let us know!