You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Metabuli project was started targeting prokaryotes and viruses.
However, since we are hearing use cases for eukaryotes and some promising performance for user side,
we are planning to optimize default settings or to add some parameters for eukaryotes.
Providing a pre-built database covering both eukaryotes and prokaryotes is also listed in the to-do list.
Here are some cases of Metabuli with eukaryotes.
Environmental DNA metabarcoding for surveying marine vertebrate (benchmarks)
Metabuli showed promising performance in classifying simulated 12S and 16S amplicon data of marine vertebrates
Working parameters: --seq-mode 1 --min-cons-cnt-euk 4 --tie-ratio 0.99
Test Metabuli for fungi.
With --min-cons-cnt-euk 4, Metabuli correctly classified 97% of paired-end reads simulated from a fungal species when its genome is included in DB.
But the percentage was dropped to 12% with the default setting (--min-cons-cnt 9).
For now, --min-cons-cnt-euk is thought to be a critical parameter.
It determines the minimum number of consecutive k-mer hits to be classified.
The strict default value of --min-cons-cnt-euk 9 was decided on older version of Metabuli as a quick remedy to reduce false positive eukaryote hits resulted by their larger genomes.
Even though we added noise filtering steps to reduce the false positives, we didn't tweak the value for eukaryotes.
Based on the user's report, setting --min-cons-cnt-euk as lower value like 4 or 5 would be good for now.
After some tests, we will make a new releases with an optimized default value.
+++
Please share your thoughts on how and what to optimize Metabuli for eukaryotes!
It helps us a lot to make Metabuli more useful for your research.
The text was updated successfully, but these errors were encountered:
Metabuli project was started targeting prokaryotes and viruses.
However, since we are hearing use cases for eukaryotes and some promising performance for user side,
we are planning to optimize default settings or to add some parameters for eukaryotes.
Providing a pre-built database covering both eukaryotes and prokaryotes is also listed in the to-do list.
Here are some cases of Metabuli with eukaryotes.
Environmental DNA metabarcoding for surveying marine vertebrate (benchmarks)
Metabuli showed promising performance in classifying simulated 12S and 16S amplicon data of marine vertebrates
Working parameters:
--seq-mode 1 --min-cons-cnt-euk 4 --tie-ratio 0.99
Test Metabuli for fungi.
With
--min-cons-cnt-euk 4
, Metabuli correctly classified 97% of paired-end reads simulated from a fungal species when its genome is included in DB.But the percentage was dropped to 12% with the default setting (
--min-cons-cnt 9
).For now,
--min-cons-cnt-euk
is thought to be a critical parameter.It determines the minimum number of consecutive k-mer hits to be classified.
The strict default value of
--min-cons-cnt-euk 9
was decided on older version of Metabuli as a quick remedy to reduce false positive eukaryote hits resulted by their larger genomes.Even though we added noise filtering steps to reduce the false positives, we didn't tweak the value for eukaryotes.
Based on the user's report, setting
--min-cons-cnt-euk
as lower value like 4 or 5 would be good for now.After some tests, we will make a new releases with an optimized default value.
+++
Please share your thoughts on how and what to optimize Metabuli for eukaryotes!
It helps us a lot to make Metabuli more useful for your research.
The text was updated successfully, but these errors were encountered: