Skip to content

Search suggestion performance analysis #418

Closed

Description

Following the issue kiwix/kiwix-android#2082 I've made some test searching suggestion on a low device.

I'm testing on a RaspberryPi 3b, the zim file is wikipedia_en_all_maxi_2020-08.zim stored on a external usb disk.
I'm using kiwix-search tool to search over the zim file (kiwix-search <zimfile> -s <query>), recompiled with some timing trace. It should be pretty equivalent of what is made on kiwix-android side where the thread, to avoid race condition, is creating a new reader and start on search on it.

I also tried on a smaller zim file on sdcard. I've somehow got the same results (numbers are different but ratio is the same).

Big numbers

On a "cold" search (kernel's page cache cleared using echo 1 > /proc/sys/vm/drop_caches) for f takes 12 seconds.
However, a "warm" search (rerun the same command) takes less than 2 seconds.

All the "lost time" is spend on io :
trace_cold
trace_warm)

Small numbers

Trying to better understand the problem, we can look for different parts. A "full" search is composed of :

  • Read the zim file (to be able to locate the xapian index in it) : Cold : 7.44s | Warm : 0.12s
  • Open the xapian database (internal xapian code) : Cold : 0.09s | Warm : 0.003s
  • Set the enquire on the database : Cold : 0.02s | Warm: 0.0004s
  • Run the enquire and get a set of (ranged) results from the enquire (internal xapian code) : Cold : 3.74s | Warm : 1.5s
  • Display/use the results. Cold: 0.001s | Warm: 0.001s

Such precision is disputable but it indicates well where we spend time.

What can we do ?

On the real performance side, I think there is not a lot we can do.
Most of the real time is spend in xapian code. And even if this part is improved it will not help a lot for the first search.
If we don't have to file quickly available, we will have to wait. No choice.
We must be prepared for long search (Ensure the UI is not blocked by long search. Display useful things to users while the search is ongoing, ...)

On a classical usage, the zim file should be already opened when we start a search. So the reading of the zim file should be quick. So a cold search is more about 5s that 12s.

We may try to mitigate the user feeling by try to "pre-cache" thing when possible before the user do a search.

  • Opening the zim file. Nothing to do here. We will not open all zim file behind the user back to be prepared. But when we start the search, the location of xapian index and such should be quick as the data is already cached.
  • Opening the database and pre-setup the enquire. Here we can improve things. We can assume that when a user open a zim, he will search in it and directly open the database and setup an enquire. It would allow use to win 1s.
  • Getting the results. We cannot do a much here neither. This is were the real code is done and we cannot do it before the user do the search.

Get less results ?

The time to get the results from the enquire is related to the number of result we retrieve.
However this is not linear. Retrieving twice less results doesn't reduce the time by two.
Running the request and retrieve no results takes 1s (warm or cold). And it doesn't help the retrieving of other results.

Async ?

Having a async api would not really help.
It would be difficult to have intermediate steps. The whole results would be usable only when the search is finished. We can simply run the search in a thread and update display when the search is finished.


Questions ?
Ideas ?
Suggestions ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions