Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: reduce minimizer threshold from 0.3 to 0.1 #1409

Merged
merged 9 commits into from
Feb 13, 2024

Conversation

rneher
Copy link
Member

@rneher rneher commented Feb 11, 2024

for very diverse viruses, our current match fraction threshold of 0.3 for minimizers is too stringent. 0.1 is still very sensitive in the sense that no random hits are produced. But we have many suboptimal hits between related viruses. (like RSV-A matching RSV-B). This is not a problem as long as we consider the best hit. But in the sort command we end up not sorting anymore.

Copy link

vercel bot commented Feb 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
nextclade ✅ Ready (Inspect) Visit Preview Feb 13, 2024 6:31am

@rneher
Copy link
Member Author

rneher commented Feb 11, 2024

The sort command currently outputs a sequence into a file corresponding to the dataset prefix if a dataset with this prefix is a hit. If we changed this to the more stringent condition that the dataset needs to be the best hit, the sort would work more as a hierarchical sort.

@rneher
Copy link
Member Author

rneher commented Feb 11, 2024

If I understand this correctly, one could only process the first (best) dataset here:
https://github.com/nextstrain/nextclade/blob/master/packages_rs/nextclade-cli/src/cli/nextclade_seq_sort.rs#L200-L207

This allows to unambiguously and reliably map entries in the input fasta to the entries in the output tsv, which is important in presence of duplicated sequence names.
@ivan-aksamentov ivan-aksamentov merged commit 1690c28 into master Feb 13, 2024
20 of 21 checks passed
@ivan-aksamentov ivan-aksamentov deleted the feat/reduce-minimizer-threshold branch February 13, 2024 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants