detector: handle AlreadyExists errors as transient during concurrent reconciliations#7170
Conversation
Replace retry.RetryOnConflict with retry.OnError and introduce a named retry predicate handling Conflict + AlreadyExists. This ensures transient AlreadyExists errors during concurrent reconciliations are retried instead of failing. Signed-off-by: arnavgogia20 <arnavgogia404@gmail.com>
Add pkg/detector/detector_retry_test.go with InterceptingClient to simulate race conditions and verify correct retry behavior for ApplyPolicy and ApplyClusterPolicy. Signed-off-by: arnavgogia20 <arnavgogia404@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @arnavgogia20, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a bug where concurrent operations in the detector controller could lead to reconciliation failures due to Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a race condition by updating the retry logic to handle AlreadyExists errors in addition to Conflict errors during the creation of ResourceBinding and ClusterResourceBinding. The change from retry.RetryOnConflict to retry.OnError with a custom predicate is a clean solution. The addition of unit tests using an intercepting client to simulate the race condition is excellent and ensures the fix is robust. I have a couple of minor suggestions to improve comment accuracy and adhere to the project's style guide.
Signed-off-by: arnavgogia20 <arnavgogia404@gmail.com>
Signed-off-by: arnavgogia20 <arnavgogia404@gmail.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7170 +/- ##
==========================================
+ Coverage 46.56% 46.57% +0.01%
==========================================
Files 700 700
Lines 48139 48141 +2
==========================================
+ Hits 22414 22421 +7
+ Misses 24040 24036 -4
+ Partials 1685 1684 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hii @RainbowMango @Garrybest @whitewindmills , can u kindly review the PR......thankyouu |
|
Sounds weird; a workqueue guarantees that the same object won't be dequeued again until it's been processed. Could you explain how concurrent reconciliation of the same object occurs? |
Hii @whitewindmills |
| bindingCopy := binding.DeepCopy() | ||
| err = retry.RetryOnConflict(retry.DefaultRetry, func() (err error) { | ||
| // RetryOnConflict handles AlreadyExists as well because during concurrent reconciliations, | ||
| // another controller might have created the binding between our Get and Create calls. |
There was a problem hiding this comment.
@arnavgogia20 thanks for your explanation. This will never happen; a binding will only be created by one controller.
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes race conditions in the detector controller where concurrent reconciliations can trigger
AlreadyExistserrors duringResourceBindingandClusterResourceBindingcreation.Previously, only conflict errors were retried. However, under concurrent execution,
AlreadyExistsis also a transient and expected outcome that should not be treated as a terminal failure. This PR updates the retry logic to explicitly retry on bothConflictandAlreadyExistserrors, embracing eventual consistency and preventing unnecessary reconcile failures that violate SLOs.Which issue(s) this PR fixes:
Fixes #7120
Special notes for your reviewer:
retry.RetryOnConflictwithretry.OnErrorusing a named predicate to retry on bothConflictandAlreadyExistserrors.ApplyPolicyandApplyClusterPolicypaths.AlreadyExistsrace conditions.go test ./pkg/detector/...Does this PR introduce a user-facing change?:
NONE