Option to save both the "full" and "merge-branches" database files #4

blinard-BIOINFO · 2022-04-13T15:49:20Z

Currently, with "--merge-branches" only the database with only the highest probability branch per k-mer is output as .rps .
Right now, two consecutive runs are necessary to get both the merged and unmerged versions, which involves uncessary recomputation of phylo-k-mers.

I suggest to use the following extension to differentiate them :
.mps ("m"erged)
.rps (current default behaviour used for placement in "r"appas2)

I need an xpas option to get either i) only the .mps, or ii) both the .mps + .rps in a single run.

Looking at the code, is seems that only small changes in step 2 (filtering) are needed :

the filtering step needs to be duplicated (one for each version)
e.g a first call db_builder::merge_filtered() without merged_branches=false, then save as .rps
then a second call with merged_branches=true, then save as .mps

blinard-BIOINFO · 2022-04-13T15:54:43Z

The mechanic that will be broken is that xpas::build() return a single _phylo_kmer_db.
Here it will have to return two ...
Note that for this particular application, teomporary duplication of _phylo_kmer_db in memory should not be an issue.
The merged version will be much smaller than the full version (on top of that, for the amino acids application k is small, around 5~7).

blinard-BIOINFO assigned nromashchenko and blinard-BIOINFO Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to save both the "full" and "merge-branches" database files #4

Option to save both the "full" and "merge-branches" database files #4

blinard-BIOINFO commented Apr 13, 2022

blinard-BIOINFO commented Apr 13, 2022

Option to save both the "full" and "merge-branches" database files #4

Option to save both the "full" and "merge-branches" database files #4

Comments

blinard-BIOINFO commented Apr 13, 2022

blinard-BIOINFO commented Apr 13, 2022