Skip to content

system-agent unnecessarily restarts when "gentle" conflict occurs #113

@Oats87

Description

@Oats87

There is a specific "race" condition in which the system-agent unnecessarily will restart and/or reapply a plan.

The condition that can cause this is from the following:

  1. CAPR planner delivers a new plan to the system-agent
  2. system-agent takes plan, applies it, updates the secret with the applied-checksum
  3. CAPR plansecret controller sees updated plan secret, and proceeds to update the appliedPlan on the secret
  4. system-agent in the mean time has re-enqueued and is trying to run the probes for the second iteration -- when it is done and tries to update but by this time, the CAPR plansecret controller has beat it and the system-agent gets a conflict error due to mismatched RV.

The proposed fix for this is to simply attempt to retrieve the latest secret from the api server, ensure the applied checksum still matches the plan that was just applied, and if so, update the latest secret. This is a safe operation because contractually, the system-agent and CAPR have a contract that makes each responsible for their specific keys.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions