-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up Xapian searches by preloading indexes #617
Comments
@mgautierfr @maneeshpm What do you think? Is that a proper approach? |
Any update here? |
@mgautierfr @maneeshpm We have started the dev of 7.2.0. Do we agree on this approach? |
Just to add to the documented issue here, Xapian-based search in the WASM version of libzim is basically unusable on Android, due to excessive I/O generated by libzim on startup. See openzim/javascript-libzim#42. |
When creating a
Note that we open/create a new xapian database per However, in case of multizim search, we will create a new multizim searcher and so reopen a xapian db per zim searched. Could be reuse the same xapiand db instead of reopen it ? Yes we could. However, xapian db is not thread safe. Proposition:
Note that created Also note that it seems that most of the speedup comes from the kernel cache (page cache,..) which avoid IO (both libzim searching dirent and xapian doing it stuff). Preloading the data from the zim file will be done in the same/main thread. We assume here that if we are in a use case where user will not search (and know it), this can be deactivated with cache configuration. Testing:As for #946, testing cache is a bit complex.
|
@mgautierfr Why the singlesearch is not considered as a special case of the multisearch? And the multisearch locking system made in a way that only needed Xapian searcher (opened xapiand db) instances are locked? |
For now, at least because their constructors don't have the same sementics:
In such we could only specialize the constructor but I forsee another specialization (in the way we parse db metadata or queries).
I'm not sure to understand what you are suggesting, but if you are proposing to not lock zim archive during the search, it is already the case. The problem of the lock is not that we are locking all xapian db in the same time. But if we have multizim searcher with db (A, B, C) and another with (C, D, E, F, ...X). If we have search of first multizim searcher, we have a lock on C. With two searcher it is kind of acceptable but with a lot of multizim searcher, it would be pretty complex to handle correctly. It is really easy to have a dead lock between 2 searchers. Do not sharing db between searcher avoid the problem. |
@mgautierfr Thank you, it seems clearer to me now. I still don't understand what is the fundamental reason why single db searches should be handled differently as multi db searches. Blocking If we want to avoid the chance to have one user impacting an other one in "normal" condition, we should consider all the scenarios. And having a kiwix-hotspot with only one ZIM used by 20 users at the same time is not a rare one! The challenge we have here seems to me similar to the one that a Web server has. A rare ressource, the HTTP or PHP engine for a Web server, and in our case the DB engine. What about solving that problem like a Web server does:
|
The issue is not really about one user blocking other searches while it is searching (Even if it is (?) an issue to investigate). We can have to multisearcher:
If we have two search in the same time:
Then you have a dead lock, two threads are locked. The only way to solve that would be to restart the process. Designing this correctly is not a easy task.
We can do that. But:
|
I though "MultiSearcher will clone/reopen the database" (your words) is there exactly for the purpose of not been blocked (and blocked) other (SingleSearcher) searchs? So basically a method to solve the problem of concurrent accesses? I got that wrong? For the dead lock problem, we should make that part atomic to avoid the problem you describe. Is that requirement problematic to implement? |
You got that right. Reopen the database avoid any sharing of the database and so not cross-lock at all.
We might yes. But I'm not confident with this. Not about the implementation in itself, but more with the potential conflict which can arise from the cache in libkiwix (no strong arguments to give here, just a feeling) Saying that, I realize that all this preload can already done on libkiwix side by simply creating a searcher in a background thread and storing it in the cache there. |
@mgautierfr To conclude:
|
I will do proper atomic lock. But no new xapian db handler.
This is already done. Multi zim search is working. What is not working is multi suggestion and multi lang. But they will not be handled in this issue.
I don't understand what you mean here.
The plan here it to imlement multisuggestion the same way as multisearch, so it will not revamp the ft search.
Issue #946 is about making the cache configurable (and we agree on that). |
#418 has shown that the typical steps for a search are:
Here is when it happens:
In a attempt to speed-up searches (in particular the first one) the idea would be to have the following workflow:
He would be the related questions on my side:
The text was updated successfully, but these errors were encountered: