Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to save both the "full" and "merge-branches" database files #4

Open
blinard-BIOINFO opened this issue Apr 13, 2022 · 1 comment
Assignees

Comments

@blinard-BIOINFO
Copy link
Member

Currently, with "--merge-branches" only the database with only the highest probability branch per k-mer is output as .rps .
Right now, two consecutive runs are necessary to get both the merged and unmerged versions, which involves uncessary recomputation of phylo-k-mers.

I suggest to use the following extension to differentiate them :
.mps ("m"erged)
.rps (current default behaviour used for placement in "r"appas2)

I need an xpas option to get either i) only the .mps, or ii) both the .mps + .rps in a single run.

Looking at the code, is seems that only small changes in step 2 (filtering) are needed :

  • the filtering step needs to be duplicated (one for each version)
  • e.g a first call db_builder::merge_filtered() without merged_branches=false, then save as .rps
  • then a second call with merged_branches=true, then save as .mps
@blinard-BIOINFO
Copy link
Member Author

The mechanic that will be broken is that xpas::build() return a single _phylo_kmer_db.
Here it will have to return two ...
Note that for this particular application, teomporary duplication of _phylo_kmer_db in memory should not be an issue.
The merged version will be much smaller than the full version (on top of that, for the amino acids application k is small, around 5~7).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants