-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expunge and Decommission disks in planner #7286
Expunge and Decommission disks in planner #7286
Conversation
This PR is intended to fix #6999 and #7098. The executor steps are all independent already except those that will be fixed in this PR. A disk disposition was added to the blueprint and gets changed from `InService` to `Expunged` when the `PlanningInput` changes. A disk state was also added that goes from `Active` -> `Decommissioned`. This disk state changes when the parent sled is expunged or the sled-agent has been guaranteed to have seen the disk expungement as reported via inventory. This code fully implements the planner bits. The exucutor bits are going to be pulled in from the [fewer-fatal-errors-in-executor-branch](main...fewer-fatal-errors-in-executor). The executor bits are going to be expanded to implement the expunge and decommission as dictated by the blueprint. Because the planner changes are tightly coupled with the executor, they will all come in this PR. Further code will also be added to represent the disk disposition and state in the blueprint and diff tables for omdb output. This PR is primarily blocked on [revamping the diff system](#7240). This is particularly necessary because we need to change those tables and want to do it rigorously after we have a visitor based mechanism for doing so. There is some more cleanup to do here as well, in particular around the `physical_disk_filter` and things that may not be needed when the executor bits get fully implemented.
This is implemented for both `omdb show` and `omdb diff` commands. Also properly decommission disks in test when sled is decommissioned. I'm still not sure why the comment line changed.
This is followup from #7308 and #7500 (comment): * Remove `From<BlueprintZonesConfig> for OmicronZonesConfig`. This included expunged zones, but thankfully had no callers. * Change `BlueprintZonesConfig::to_omicron_zones_config(filter)` to `BlueprintZonesConfig::into_running_omicron_zones_config()` (no `filter` argument). All the callers of this method were passing `BlueprintZoneFilter::ShouldBeRunning`, and I don't think there's a reason to use any other filter? * Remove `From<BlueprintPhysicalDisksConfig> for OmicronPhysicalDisksConfig` (which included expunged disks), and replace it with `BlueprintPhysicalDisksConfig::into_in_service_disks()`. This one _did_ have callers, including the blueprint executor, but I think we've avoided problems because the planner current [drops disks if the incoming planning input says they're not in service](https://github.com/oxidecomputer/omicron/blob/3ae42bf76cb9b55993705b817157e4f60935b6dd/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs#L1090-L1097). I'm not sure that planner behavior is right, and might change with #7286, so it seemed safer to go ahead and fix this now.
|
||
StepSuccess::new(()).into() | ||
.map_err(|err| anyhow!(err)); | ||
result_to_step_result(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should continue with execution if this step fails:
- It only talks to cockroach; if Nexus can't talk to cockroach, I'm not sure it should do anything until that's resolved?
- If it fails, it probably means something is wrong with the blueprint (the only time we've seen this fail on a real system was [reconfigurator] Planner can give out the same external IP to two different new zones #7049 where the planner spit out a bad blueprint).
Similarly, the register_sled_list_step
did not get this treatment, which also seems right: if we can't ask cockroach what sleds we're supposed to be operating on, we probably shouldn't continue.
I'm a little tempted to propose splitting this part (execution continues on failure) of the PR out into a separate one, and maybe going over execution in a group / live to make sure we all think all the steps are okay to continue on failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in de4a080. I left it in this PR for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little tempted to propose splitting this part (execution continues on failure) of the PR out into a separate one, and maybe going over execution in a group / live to make sure we all think all the steps are okay to continue on failure?
I went over the steps and put some notes in a google doc; we discussed at the update sync last week. We identified 3 steps that are dependent on PUT /omicron-zones
succeeding; #7524 added a DeployZonesDone
token to encode that in the executor for now. That'll cause conflicts with this PR, but hopefully minimal.
It does mean this PR alone doesn't fully solve #6999 - we need to fix those step dependencies too.
nexus/reconfigurator/planning/tests/output/planner_decommissions_sleds_bp2.txt
Show resolved
Hide resolved
I still need to add the schema migration code and do some testing.
87055be
to
7be692d
Compare
.set(( | ||
dsl::disk_state.eq(PhysicalDiskState::Decommissioned), | ||
dsl::time_modified.eq(now), | ||
)) | ||
.execute_async(conn) | ||
.await | ||
.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump on these questions
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor/disks.rs
Outdated
Show resolved
Hide resolved
@jgallagher I think this is ready for a re-review. I'll give it a test on a4x2 tomorrow as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, mostly just some style and organization nits around planner vs builder.
Testing on a4x2 successful. I can expunge a disk and generate blueprints and watch it expunge and decommission a disk with as well as expunge zones on that disk. I also expunged a sled and watched it expunge all disks and zones and relocate services. |
This PR partially fixes #6999, and fixes #7098 and #7502.
A few significant changes are made in this PR:
the executor itself to early return without running all steps. Instead a
StepWarning
is returned for each step that contains errors and the executorruns every step.
each other and can safely run independently. This was mostly true, except for
the bulk of the work in this PR which ensures that when a a disk is expunged, it
is not immediately decommissioned unless the sled is also expunged. The planner
waits to see that the expungement has been observed by a sled-agent on an active
sled with an expunged disk before it marks it decommissioned.
ready_for_cleanup
in the blueprint. This completes the feedback loopbetween planning, inventory collection, and execution. Previously disk
decommissioning was done for all expunged disks. This was racy in
that a separate nexus may have expunged a disk that was not yet ready
to be decommissioned, and it would be picked up in the decommissioning
step of the first nexus's executor. There is more information in the
following commit comments.
Steps 1 and 2 above partially resolve #6999 and resolve #7502. While I was modifying the datastore
to implement step 3 I also updated the
modified_time
of the physical diskstable for necessary transactions. This fixes #7502.
This PR also fixes a planner bug where we only check the planning input for
decommissioned disks even if there was a change in the parent blueprint, but no
corresponding change to the planning input.