-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Description
I had to re-index myEpisode table due to adding a new attribute and running into various other issues. The table contains about 26k records where each record is reasonable size. Keep in mind all this data was already indexed and i am attempting to re-index it.
Attempt 1 - Episode.reindex!
My first attempt was to simply follow the code in the read me and call Episode.reindex!. This caused the following error after some time and resulted in a lot of records missing:
413 Payload Too Large - The provided payload reached the size limit. The maximum accepted payload size is 20 MiB.. See https://docs.meilisearch.com/errors#payload_too_large. (MeiliSearch::ApiError)
I tried running this multiple times but always end up getting this error. I don't understand why data size would be an issue here since all my records were previously already indexed without encountering this issue. Especially when looking at the next 2 attempts.
Attempt 2 - Episode.reindex! with smaller batch size
For this i ran Episode.reindex!(100) decreasing the default batch size to about 10% of the default 1000. This seems to work but takes forever and eventually times out. the timeout could be due to my SSH connection timing out at this point. However, in this case i don't get the payload_too_large error which is strange since the same data is being sent.
This indicates to me that the issue might be a too big batch size? This seems like a bug.
Attempt 3 - Custom batch size and background job
The way i was finally able to make it (mostly - will open a separate issue for this) work is by batching records myself and moving the indexing to a background job.
class ReindexMeilisearchJob < ApplicationJob
queue_as :meilisearch_index
def perform(model_name, start_id, end_id)
model = model_name.constantize
records = model.where(id: start_id..end_id)
records.reindex!
end
endWhich i then just call:
Episode.in_batches(of: 1000).each do |batch|
ReindexMeilisearchJob.perform_later("Episode", batch.first.id, batch.last.id)
endNotice that this is still creating batches of 1000 records but without throwing a payload_too_large error. It also executes a lot faster since i am running 5 jobs in parallel.
I guess my questions are
- Why am i getting a
payload_too_largeerror? - What is the proper way to re-index a Model, small or big?
- Do i need to re-index the whole table when i add a new attribute or is there a more efficient way?
Environment (please complete the following information):
- OS: [e.g. Debian GNU/Linux]
- Meilisearch server version: Cloud v1.11.3 (v1.12 in development due to some bug on the cloud dashboard)
- meilisearch-rails version: v0.14.1
- Rails version: v8.0