-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate data was generated during expansion. #2723
Comments
@VIVALXH Great! thanks for your detailed information, it's helpful to identify this isue. |
Hi @VIVALXH go-redis might retry on network error. Would you mind disabling the retry to see if the issue is still existing? https://github.com/redis/go-redis/blob/1139bc3aa9073851f67faa6d68df07a566901dd7/options.go#L75C2-L75C12 |
@git-hulk thanks. No duplicate data was generated, but this resulted in the loss of a significant amount of data that should have been written. I controlled the osscluster retry count by setting redis.UniversalOptions.MaxRedirects = -1. However, this inevitably led to Moved errors and TryAgain errors as mentioned in this PR(#1240). The Moved error occurs between the following two commands:
Additionally, if the interval between these two commands is longer (e.g., 10 seconds), the probability of generating duplicate data increases. The duplicate data primarily appears on the new nodes after scaling out. Following the process I described, this issue can be consistently reproduced. |
@VIVALXH You should set |
I am using ClusterClient, and internally it uses MaxRedirects instead of MaxRetries in its process function. It worked as expected and achieved our goal of preventing duplicate data. This suggests that retries might have caused the data duplication, but in reality, we cannot set the retries to zero because it lost amount of data. |
@VIVALXH The cluster client also supports the MaxRetries, set MaxRedirects=-1 wouldn't retry when occuring the TRYGAIN error. |
@git-hulk |
@git-hulk So I guess the data duplication is caused by retries due to |
I'm not quite sure about this. But after considering this issue, it's impossible to push an element exactly once in a list without the deduplicate mechanism. Because we always need to retry once there are any network issues or timeout, or it might lose data. |
yes, it is impossible to cancel the retry because it lost amount of data. At present, the situation of duplicate data only occurs when scaling. Is there any other solution that can avoid this? |
Search before asking
Version
2.10
Minimal reproduce step
Env: Kvrocks cluster with 4 shards.
What did you expect to see?
no duplicate data was generated.
What did you see instead?
Anything Else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: