-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
There is a specific "race" condition in which the system-agent unnecessarily will restart and/or reapply a plan.
The condition that can cause this is from the following:
- CAPR
plannerdelivers a new plan to the system-agent system-agenttakes plan, applies it, updates the secret with theapplied-checksum- CAPR
plansecretcontroller sees updated plan secret, and proceeds to update theappliedPlanon the secret - system-agent in the mean time has re-enqueued and is trying to run the probes for the second iteration -- when it is done and tries to update but by this time, the CAPR
plansecretcontroller has beat it and thesystem-agentgets aconflicterror due to mismatched RV.
The proposed fix for this is to simply attempt to retrieve the latest secret from the api server, ensure the applied checksum still matches the plan that was just applied, and if so, update the latest secret. This is a safe operation because contractually, the system-agent and CAPR have a contract that makes each responsible for their specific keys.
Metadata
Metadata
Assignees
Labels
No labels