-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search suggestion performance analysis #418
Comments
Ping @macgills (I cannot assign the issue to you :/) |
Thanks for the detailed analysis. I am wondering why the search was instantaneous in kiwix-android v2.5 (and whatever libzim accompanied it) with the 2018-10 en_all_maxi? Why was the old version so much faster? What was changed in the software to cause this? |
On the android side I think the coroutines implementation along with an actual UI state for "search in progress" will go a long way. The API has also been updated to be thread safe right so I can avoid the creation of a new reader per search? |
New libzim use xapian database to search for suggestions. In old version we where simply searching for article's titles starting by the query.
Not yet. However, the creation of a new reader should be quick as you already have one and so the page cache is already populated.
Not sure to understand this point. |
This might be my lack of knowledge on the internals of kiwixlib but talking from the client perspective the app only keeps 1 Reader in memory at a time, when we open a new 1 we call |
@mgautierfr Would you be able please to transform this ticket in actionable tickets? |
@veloman-yunkan @mgautierfr @macgills Kiwix Android 3.4.1 has brough significant improvements in term of suggestion speed. But first reports seem to indicate that this might still be too slow... and we still have serious performance problem with kiwix-serve. If the first feedbacks from 3.4.1, this ticket and its following actionable will come on the really top of the TODO list at libzim level. |
@mgautierfr @maneeshpm I’m in favour of closing this ticket and opening a new one requesting a pre-setup of the enquires for both the title and the ft indexes. Good for you? |
@kelson42 I agree. We need to break this into actionable tickets and presetup of enquire can the first of these. |
@mgautierfr @maneeshpm I have open #617 to propose a pre-loading of the Xapian indexes/enquires. Closing that one. |
Following the issue kiwix/kiwix-android#2082 I've made some test searching suggestion on a low device.
I'm testing on a RaspberryPi 3b, the zim file is
wikipedia_en_all_maxi_2020-08.zim
stored on a external usb disk.I'm using
kiwix-search
tool to search over the zim file (kiwix-search <zimfile> -s <query>
), recompiled with some timing trace. It should be pretty equivalent of what is made on kiwix-android side where the thread, to avoid race condition, is creating a new reader and start on search on it.I also tried on a smaller zim file on sdcard. I've somehow got the same results (numbers are different but ratio is the same).
Big numbers
On a "cold" search (kernel's page cache cleared using
echo 1 > /proc/sys/vm/drop_caches
) forf
takes 12 seconds.However, a "warm" search (rerun the same command) takes less than 2 seconds.
All the "lost time" is spend on io :
![trace_cold](https://user-images.githubusercontent.com/86161/91966237-dc089b80-ed11-11ea-967b-0b9453588969.png)
)
Small numbers
Trying to better understand the problem, we can look for different parts. A "full" search is composed of :
Such precision is disputable but it indicates well where we spend time.
What can we do ?
On the real performance side, I think there is not a lot we can do.
Most of the real time is spend in xapian code. And even if this part is improved it will not help a lot for the first search.
If we don't have to file quickly available, we will have to wait. No choice.
We must be prepared for long search (Ensure the UI is not blocked by long search. Display useful things to users while the search is ongoing, ...)
On a classical usage, the zim file should be already opened when we start a search. So the reading of the zim file should be quick. So a cold search is more about 5s that 12s.
We may try to mitigate the user feeling by try to "pre-cache" thing when possible before the user do a search.
Get less results ?
The time to get the results from the enquire is related to the number of result we retrieve.
However this is not linear. Retrieving twice less results doesn't reduce the time by two.
Running the request and retrieve no results takes 1s (warm or cold). And it doesn't help the retrieving of other results.
Async ?
Having a async api would not really help.
It would be difficult to have intermediate steps. The whole results would be usable only when the search is finished. We can simply run the search in a thread and update display when the search is finished.
Questions ?
Ideas ?
Suggestions ?
The text was updated successfully, but these errors were encountered: