retry on conflict when syncing work for binding controllers#7121
retry on conflict when syncing work for binding controllers#7121AnupamSingh2004 wants to merge 1 commit intokarmada-io:masterfrom
Conversation
Signed-off-by: AnupamSingh2004 <sanupam2004@gmail.com>
|
Welcome @AnupamSingh2004! It looks like this is your first PR to karmada-io/karmada 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @AnupamSingh2004, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines how Kubernetes conflict errors are handled within the binding controllers. By integrating a retry mechanism for optimistic concurrency conflicts, the system will now correctly process these transient errors, leading to more accurate Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses the issue of Kubernetes conflict errors being incorrectly reported as failures in the binding_sync_work_duration_seconds metric. By wrapping the ensureWork call with retry.RetryOnConflict in both binding_controller.go and cluster_resource_binding_controller.go, the change ensures that conflicts are retried and only the final outcome is recorded, leading to more accurate availability metrics. The implementation is clean, follows a standard pattern for handling optimistic concurrency conflicts in Kubernetes controllers, and is consistent with similar changes made elsewhere in the project. The changes look good and I have no further comments.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7121 +/- ##
=======================================
Coverage 46.54% 46.55%
=======================================
Files 700 700
Lines 48128 48132 +4
=======================================
+ Hits 22403 22409 +6
+ Misses 24040 24039 -1
+ Partials 1685 1684 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/assign |
|
Generally looks good to me. /lgtm /cc @RainbowMango |
|
@jabellard: GitHub didn't allow me to request PR reviews from the following users: for, another, look. Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
RainbowMango
left a comment
There was a problem hiding this comment.
/hold
wait for confirmation from #7111 (comment)
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR mitigates the issue where the
binding_sync_work_duration_secondsmetric records Kubernetes conflict errors (HTTP 409) as failures, causing false SLO violations and misleading availability metrics.Problem:
result="error"in the metricSolution:
By wrapping
ensureWork()withretry.RetryOnConflict(), conflict errors are automatically retried and only the final outcome is recorded in the metric:result="success"result="error"This is the same approach used in PR #7106 for the
execution-controller(work_sync_workload_duration_secondsmetric).Files changed:
pkg/controllers/binding/binding_controller.goensureWorkwithretry.RetryOnConflictpkg/controllers/binding/cluster_resource_binding_controller.goensureWorkwithretry.RetryOnConflictWhich issue(s) this PR fixes:
Part of #7111
Special notes for your reviewer:
This is a follow-up to #7106 which implemented the same pattern for the
execution-controller. As requested by @jabellard in the issue discussion, the same fix is needed for the binding controllers.The pattern follows exactly what was suggested:
cc @jabellard @RainbowMango
Does this PR introduce a user-facing change?: