Rename indexes #79
Replies: 4 comments 11 replies
-
Further notes: Algolia support for renamesAlgolia support atomic index moves https://www.algolia.com/doc/api-reference/api-methods/move-index/. However, this has a lot of restrictions:
also they state that the move is an expensive operation, since it is a clone and rename operation. Atomicity seems rather complicated to ensure in our case. This would require a hard sync of all replicas in the cluster, blocking any further progression of the cluster, and performing a 2-phase commit to make sure that every replica perform the rename at the same time and succeed. We opt for an eventually consistent rename. |
Beta Was this translation helpful? Give feedback.
-
Is there any progression on this? Thanks in advance |
Beta Was this translation helpful? Give feedback.
-
Hey everyone 👋
📚 More information here. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone 👋 I'm locking that discussion. Please open a new discussion if you have feedback about the index swapping or if you just need to rename an index (which is a different need) 😇 Thank you! 🙇♂️ |
Beta Was this translation helpful? Give feedback.
-
As per meilisearch/meilisearch#1080 and the public roadmap, renaming indexes is a feature we want to support. This can be done thanks to the index update route without any changes to the API.
There are a certain number of constraints that must be resolved in order to perform the rename.
The renames must be serializable with the updates to the same indexes. Say I have an index
test
:POST indexes/test/documents
POST indexes/test/updates
rename to test2POST indexes/test2/documents
requirements:
i) If the first update is still enqueued at the moment the index is renamed, it should not cause any problem.
ii) Any update that occurs after the rename should be linked to the correct update queue.
iii) Renames need to be asynchronous to support distributed system
i) and ii) can be addressed by resolving the index names to their
Uuid
before pushing them to the queue and have the rename be performed synchronously. As long as 2) occurs before 3) the name will always resolve to the correct uuid, so 1) and 3), despite being referred to with different names, will actually refer to the same index uuid.iii) is more complicated, because 3) may be pushed to an update queue before 2) is processed. This means that test2 does not yet point to an uuid, but it still needs to be pushed to an update queue somehow. There is no way to know that this update does not refer to a newly created index and not an already existing one, so a new index will be created (cf. Lazy index creation). This is not the behavior we intend.
There is no straightforward solution for iii):
Intermediate update queue.
A way to address 3 would be to defer the resolution of the index name to it's uuid thanks to an intermediate, global, update queue.
All updates, no matter what index they concern are pushed onto a global update queue. When they are popped, their uuid is resolved and then processed in one of two ways:
EDIT: the sync updates are in fact not so sync, since they are to be replicated. They must be processed before any other update on the replica nodes AND immediately taken into account on the master node so new updates are pushed in the correct queue, and performed atomically:
Since synchronous updates are always processed before the next update is popped from the queue, an asynchronous update will always be resolved to the correct update uuid.
There are still issues:
This is an issue because the failure of the rename will cause the next update to point to a unexisting index, and trigger index creation. I don't see any way to prevent this, but disallowing lazy index creation.
How do we prevent 3) from happening before 2)
Once again, no easy solution, if any... if the order is inverted, this will cause a new index
test2
to be created, and then test will be renamed totest2
, effectively leaking the index and overwrite it with all the data fromtest
. Not good. Once again, lazy index creation gets in the way.How do we prevent one index to be aliased with a name that already exists?
Related to the previous issue, we have to be very careful about this:
-What of per index update ids?
This seems to rule out this possibility altogether, since updates are enqueued before the index they refer to is actually resolved.
We see there that renaming indexes is not at all a straightforward task, and we may need to make tradeoffs to make it possible.
Everything has not yet been thought of, and this architecture is somewhat similar to what @Kerollmops had suggested in a meeting some time ago.
EDIT 2: This issue only talks about rename, but the same goes for index deletion and swap.
Beta Was this translation helpful? Give feedback.
All reactions