Fix issues with ClusterPipeline connection management #3804
Fix issues with ClusterPipeline connection management #3804praboud wants to merge 7 commits intoredis:masterfrom
Conversation
| # we figure out the slot number that command maps to, then from | ||
| # the slot determine the node. | ||
| for c in attempt: | ||
| while True: |
There was a problem hiding this comment.
Unless I've completely lost the plot, this used to be a while True loop which always breaks on the first iteration, so this does ~nothing.
|
Hi @praboud, thank you for your contribution! We will review your changes soon. |
|
@petyaslavova hey, just checking in on when you'll have a chance to review this PR. |
|
@petyaslavova this PR addresses a fairly serious correctness issue with the Pipeline implementation, which can result in eg: one request getting the data from another request. If those two requests are eg: handling different users' data, this could be a really problematic security or privacy issue. I'm currently using a forked version of redis-py with this fix patched in, but I would love to get this upstreamed so that others don't run into the same problem. Do you have a sense of when it would be possible to get this reviewed? (I know there's a merge conflict - if you're +1 on the general approach in this PR, I can do a pass to resolve the conflict.) |
|
Sorry, I just saw your comment on #3803 (comment) re: timing. Thanks for the update there, though I am a little worried that other folks using this library are unknowingly having correctness problems with the bug that this PR addresses . |
There was a problem hiding this comment.
Pull request overview
This pull request fixes critical connection management issues in ClusterPipeline that could lead to connection leaks and data corruption. The PR addresses two main problems: (1) connections being leaked when unexpected errors occur during connection establishment, and (2) dirty connections (with unread responses) being returned to the pool, causing subsequent requests to receive responses from previous requests.
Changes:
- Wrapped the entire command execution flow in a try/finally block to ensure connections are always released
- Added connection dirty state tracking (nodes_written/nodes_read counters) to detect connections that have been written to but not read from
- Added logic to disconnect dirty connections before returning them to the pool to prevent response mixing
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| redis/cluster.py | Fixed connection management by adding a comprehensive try/finally block with dirty connection detection and cleanup logic; added type hints to NodeCommands constructor |
| tests/test_cluster.py | Added two test cases to verify connection leak prevention and dirty connection handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
❌ Security scan failedSecurity scan failed: Branch pipeline-conns does not exist in the remote repository 💡 Need to bypass this check? Comment |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
❌ Security scan failedSecurity scan failed: Branch pipeline-conns does not exist in the remote repository 💡 Need to bypass this check? Comment |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
❌ Security scan failedSecurity scan failed: Branch pipeline-conns does not exist in the remote repository 💡 Need to bypass this check? Comment |
| self._nodes_manager.initialize() | ||
| if is_default_node: | ||
| self._pipe.replace_default_node() | ||
| nodes = {} |
There was a problem hiding this comment.
Shouldn't this be done in the finally block?
| connection = get_connection(redis_node) | ||
| except (ConnectionError, TimeoutError): | ||
| for n in nodes.values(): | ||
| n.connection_pool.release(n.connection) |
There was a problem hiding this comment.
I think you can remove the releasing of the connections from here - they will be released in the finally block.
Pull Request check-list
Please make sure to review and check all of these items:
NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
ClusterPipeline doesn't correctly handle returning connections to the connection pools.
This diff addresses these issues by wrapping the entire area in a try/catch with a finally that ensures the connections are released, but which closes the connections if we have read but not written to the connection.