-
Notifications
You must be signed in to change notification settings - Fork 2.6k
(fix): Separated retries for read and write operations #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach doesn't seem to work:
- There is no reconnect mechanism inside the
parse_reponse
method. If Redis goes down betweensend_command
and the parse retry, it will get stuck retryingparse_response
indefinitely. - Even if you add a guard to
AbstractConnection.read_response
, similar to the one incan_read
:
Lines 592 to 594 in 9251650
sock = self._sock if not sock: self.connect()
it will still send some commands, such as those in theon_connect
method, which generate new responses. As a result, the response to the original command (e.g.,XADD
) may no longer be available or the correct one.
Unfortunately, fixing this seems to be more complex than it appears.
@ManelCoutinhoSensei I think we're talking here about different issues, the one that you refers is #3555. This one fixes potential duplicate writes if read fails and it does fix that |
No, I believe I'm referring to the same issue you are (@vladvildanov). Let me clarify my first point and let me know if I'm missing something: I understand that your fix technically avoids retrying A potential (though admittedly suboptimal) alternative would be to remove the retry mechanism for reads entirely. Instead, you could return a custom error indicating that the (My second point about adding |
@ManelCoutinhoSensei Thanks for this robust explanation, now I'm so much into a core of the issue! However, wouldn't it makes sense to reconnect inside of https://github.com/redis/redis-py/blob/master/redis/client.py#L902 |
@vladvildanov, That might actually be a better approach than the one I was describing. However, I believe it still encounters the issue I mentioned in my second point: Lines 498 to 509 in 9251650
As a result, by the time you attempt to read the response to the original command, it will no longer be available. |
@ManelCoutinhoSensei Well you're right, the existing logic that consider disconnect on retry makes it impossible to separate retries for write and read, I did some investigation in other clients and they also stick to this "transactional" logic when doing retries, so to be consistent we assume that command is called at least once, but you may live with duplicate. Other way you need to handle retries in the application. https://redis.github.io/lettuce/advanced-usage/#failures-and-at-least-once-execution |
Hey @vladvildanov, Before closing this PR and the issue completely, I wanted to check if an alternative approach could address the duplicate problem while still maintaining the at least once assumption used by other libraries. Wouldn't something like this work? try:
conn.retry.call_with_retry(
lambda: conn.send_command(*args, **options),
lambda error: self._disconnect_raise(conn, error),
)
return self.parse_response(conn, command_name, **options)
except ParseResponseError as e: # To be replaced with correct Error
raise CustomErrorSayingThatReadResponseFailed
finally:
... This keeps the changes minimal, avoids the duplication caused by read issues, preserves the original assumption (considering retries are already in place—without them, the guarantee of at least once doesn't hold anyway). and, instead of requiring users to track whether the command was sent once or twice, they would only need to check whether it was successful in case of an error. I know other clients aren't doing this, but I believe we can do better 😉. This seems like a reasonable compromise between the ideal solution and taking no action. What do you think? |
@ManelCoutinhoSensei Currently our users have an expectations that retry happens on timeout errors, your approach doesn't take that into a count. If you disable the retries you would achieve exactly what you mentioned "only need to check whether it was successful in case of an error". |
@ManelCoutinhoSensei Let's say this way, we can support an additional retry strategy, something like "at most once", but the existing one should be kept |
@vladvildanov, I have bad news and good news. The bad news is that, as it stands, the retry mechanism does not work properly with non-idempotent operations. I believe this should be made clearer in the documentation. In the worst-case scenario, if the Redis connection is unstable, the same command could be sent up to The good news is that since every Here’s a rough draft of how it could work that would help you jump start this small implementation: def _execute_command_with_strategy(self, conn, command_name, *args, **options):
if self.at_least_once: # Or retrieve it from the retry object (e.g., conn.retry.at_least_once)
# This behaves as before, ensuring "at least once" execution.
return self._send_command_parse_response(conn, command_name, *args, **options)
# "At most once" case:
# - Send the command with retries
# - If an exception occurs during response parsing, leave it to the user to handle
conn.send_command(*args, **options)
try:
return self.parse_response(conn, command_name, **options)
except Exception as e:
raise InternalException # Decide the exception to send here: if internal or smt already defined for the final user
def _execute_command(self, *args, **options):
"""Execute a command and return a parsed response"""
pool = self.connection_pool
command_name = args[0]
conn = self.connection or pool.get_connection(command_name, **options)
try:
return conn.retry.call_with_retry(
lambda: self._execute_command_with_strategy(conn, command_name, *args, **options),
lambda error: self._disconnect_raise(conn, error),
) This way, we don't have to discard the retry mechanism entirely. Instead, we can provide a configurable approach that better supports non-idempotent operations. What do you think? 🚀 |
@ManelCoutinhoSensei Looks good to me, I'll take it into an action as a separate PR, will add you as a reviewer |
@ManelCoutinhoSensei My proposal is following, for me the behaviour when we always disconnecting on retry is something odd here for timeout errors it doesn't makes sense to do it every time, only after retry exceeding. It makes sense to disconnect on So let's change the |
@petyaslavova Would appreciate your thoughts on this ⬆️⬆️ |
It sounds reasonable. When we have a broken connection, retrying the response reading won't bring any value. |
@vladvildanov Let me try to rephrase your proposal to make sure I understood it correctly: You're suggesting the following:
If that's what you're proposing, I think there are two things worth considering:
|
@ManelCoutinhoSensei Yeah, you're get it right! Let me elaborate a bit on your worries:
|
@vladvildanov I assume that you meant Lines 604 to 625 in 9251650
|
@ManelCoutinhoSensei Well, I don't get it, but then let's restrict it for |
@vladvildanov I think |
@ManelCoutinhoSensei It's still a question how do we want to handle failures on read in case of "at-most-once" strategies and |
@vladvildanov Sounds good!
True. That's why, in my suggestion, I proposed a new custom error for this case.
Good catch!! Looking at it, it shouldn't be too hard to implement something like the one that I proposed for the regular Redis client tho.
Awesome!! Let me know if you need anything or any help! 🚀 |
Pull Request check-list
Please make sure to review and check all of these items:
Description of change
Separate retries for read and write operations to prevent duplicated writes on read failure.
Closes #3554