Releases · cov-lineages/pangoLEARN

Issue identified in global tree, identical sequences present in B.1.1 base and B.1.374. This patch includes a newly trained model with the re-assignment of B.1.374 to B.1.1 and B.1.374.1 to B.1.1.316.

Assets 2

09 Dec 13:36

aineniamh

2020-11-30

df35163

pangoLEARN data release 2020-11-30

Release notes

Lineage curation
The lineage assignments have been fully updated from a tree built using FastTree MP. Previously, releases had some circularity as four sub-trees were built based on pangolin assignments for A, B, B.1 and B.1.1, and then manually assigned. This release resolves this circularity by building a single tree with all SARS-CoV-2 diversity. This tree was then split into 25 roughly equal-sized chunks using jclusterfunk. These chunks were then manually curated for new lineages and previous lineage definitions were refined/ updated.
There are now 779 lineages defined, full details found at cov-lineages.org.
Ambiguity curation
In this release we are doing more aggressive data curation in training to remove ambiguous sequences and resolve conflicts resulting from ambiguities in the training set.
Results on accuracy and precision are generated using 10 fold cross validation with the curated dataset, so be aware that query sequences with a lot of ambiguity may have lower assignment accuracy
Model description
The current version of pangoLEARN uses a Classification Tree, but the pipeline has been written so that as more complex models are developed, the user will be able to choose which model to use to assign their lineages. The model was trained using 188,193 SARS-CoV-2 sequences from GISAID, with their lineages assigned by manually curating the global ML tree, as is the standard lineages data release procedure for pangolin. Training takes approximately 2 hours on our hardware (may change with different hardware). This model was built using the standard sci-kit learn implementation of the decision tree learning algorithm. The code for this process is available in the cov-lineages/cov-support repository.

Assets 2

30 Oct 16:46

aineniamh

2020-10-30

123f61c

pangoLEARN data release 2020-10-30

pangoLEARN data release 2020-08-29_3

Minor release

Updated decision tree model, excluding any N's from training. Reference base at a given position taken as uninformative alternative to N.
Updated recall rates for each lineage.

A222V lineage B.1.177 included in training set

Assets 2

28 Sep 16:28

aineniamh

2020-08-29_3

175155f

pangoLEARN data release 2020-08-29_3

Minor release

Updated decision tree model, excluding any N's from training. Reference base at a given position taken as uninformative alternative to N.
Updated recall rates for each lineage

Assets 2

Releases: cov-lineages/pangoLEARN

pangoLEARN data release 2021-01-20

Release notes

Uh oh!

pangoLEARN data release 2021-01-16

Release notes

Uh oh!

pangoLEARN data release 2021-01-11

Release notes

Uh oh!

pangoLEARN data release 2021-01-06

Release notes

Uh oh!

pangoLEARN data release 2020-12-17_2

Release notes

Uh oh!

pangoLEARN data release 2020-12-17

Uh oh!

pangoLEARN data release 2020-11-30_2

Release notes:

Uh oh!

pangoLEARN data release 2020-11-30

Release notes

Uh oh!

pangoLEARN data release 2020-10-30

pangoLEARN data release 2020-08-29_3

Uh oh!

pangoLEARN data release 2020-08-29_3

Minor release

Uh oh!