Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expunge and Decommission disks in planner #7286

Merged
merged 38 commits into from
Feb 22, 2025

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Dec 19, 2024

This PR partially fixes #6999, and fixes #7098 and #7502.

A few significant changes are made in this PR:

  1. Where possible execution steps no longer error out and cause
    the executor itself to early return without running all steps. Instead a
    StepWarning is returned for each step that contains errors and the executor
    runs every step.
  2. Some fallible steps have become decoupled in that they do not rely on
    each other and can safely run independently. This was mostly true, except for
    the bulk of the work in this PR which ensures that when a a disk is expunged, it
    is not immediately decommissioned unless the sled is also expunged. The planner
    waits to see that the expungement has been observed by a sled-agent on an active
    sled with an expunged disk before it marks it decommissioned.
  3. The executor only decommissions physical disks if their disposition is expunged and they are ready_for_cleanup in the blueprint. This completes the feedback loop
    between planning, inventory collection, and execution. Previously disk
    decommissioning was done for all expunged disks. This was racy in
    that a separate nexus may have expunged a disk that was not yet ready
    to be decommissioned, and it would be picked up in the decommissioning
    step of the first nexus's executor. There is more information in the
    following commit comments.
  4. A Disposition field have been added to the table output for disks.

Steps 1 and 2 above partially resolve #6999 and resolve #7502. While I was modifying the datastore
to implement step 3 I also updated the modified_time of the physical disks
table for necessary transactions. This fixes #7502.

This PR also fixes a planner bug where we only check the planning input for
decommissioned disks even if there was a change in the parent blueprint, but no
corresponding change to the planning input.

This PR is intended to fix #6999 and #7098.

The executor steps are all independent already except those that will be
fixed in this PR. A disk disposition was added to the blueprint and gets
changed from `InService` to `Expunged` when the `PlanningInput` changes.
A disk state was also added that goes from `Active` -> `Decommissioned`.
This disk state changes when the parent sled is expunged or the
sled-agent has been guaranteed to have seen the disk expungement as
reported via inventory.

This code fully implements the planner bits. The exucutor bits are
going to be pulled in from the [fewer-fatal-errors-in-executor-branch](main...fewer-fatal-errors-in-executor).
The executor bits are going to be expanded to implement the expunge and
decommission as dictated by the blueprint. Because the planner changes
are tightly coupled with the executor,  they will all come in this PR.

Further code will also be added to represent the disk disposition and
state in the blueprint and diff tables for omdb output. This PR is primarily
blocked on [revamping the diff system](#7240).
This is particularly necessary because we need to change those tables
and want to do it rigorously after we have a visitor based mechanism
for doing so.

There is some more cleanup to do here as well, in particular around
the `physical_disk_filter` and things that may not be needed when the
executor bits get fully implemented.
This is implemented for both `omdb show` and `omdb diff` commands.

Also properly decommission disks in test when sled is decommissioned.

I'm still not sure why the comment line changed.
@andrewjstone andrewjstone changed the title WIP: Expunge and Decommission disks in planner Expunge and Decommission disks in planner Feb 8, 2025
@andrewjstone andrewjstone marked this pull request as ready for review February 8, 2025 00:29
jgallagher added a commit that referenced this pull request Feb 10, 2025
This is followup from #7308 and
#7500 (comment):

* Remove `From<BlueprintZonesConfig> for OmicronZonesConfig`. This
included expunged zones, but thankfully had no callers.
* Change `BlueprintZonesConfig::to_omicron_zones_config(filter)` to
`BlueprintZonesConfig::into_running_omicron_zones_config()` (no `filter`
argument). All the callers of this method were passing
`BlueprintZoneFilter::ShouldBeRunning`, and I don't think there's a
reason to use any other filter?
* Remove `From<BlueprintPhysicalDisksConfig> for
OmicronPhysicalDisksConfig` (which included expunged disks), and replace
it with `BlueprintPhysicalDisksConfig::into_in_service_disks()`. This
one _did_ have callers, including the blueprint executor, but I think
we've avoided problems because the planner current [drops disks if the
incoming planning input says they're not in
service](https://github.com/oxidecomputer/omicron/blob/3ae42bf76cb9b55993705b817157e4f60935b6dd/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs#L1090-L1097).
I'm not sure that planner behavior is right, and might change with
#7286, so it seemed safer to go ahead and fix this now.

StepSuccess::new(()).into()
.map_err(|err| anyhow!(err));
result_to_step_result(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should continue with execution if this step fails:

Similarly, the register_sled_list_step did not get this treatment, which also seems right: if we can't ask cockroach what sleds we're supposed to be operating on, we probably shouldn't continue.

I'm a little tempted to propose splitting this part (execution continues on failure) of the PR out into a separate one, and maybe going over execution in a group / live to make sure we all think all the steps are okay to continue on failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in de4a080. I left it in this PR for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little tempted to propose splitting this part (execution continues on failure) of the PR out into a separate one, and maybe going over execution in a group / live to make sure we all think all the steps are okay to continue on failure?

I went over the steps and put some notes in a google doc; we discussed at the update sync last week. We identified 3 steps that are dependent on PUT /omicron-zones succeeding; #7524 added a DeployZonesDone token to encode that in the executor for now. That'll cause conflicts with this PR, but hopefully minimal.

It does mean this PR alone doesn't fully solve #6999 - we need to fix those step dependencies too.

@andrewjstone andrewjstone force-pushed the expunge-and-decommission-disks-in-planner branch from 87055be to 7be692d Compare February 20, 2025 20:08
.set((
dsl::disk_state.eq(PhysicalDiskState::Decommissioned),
dsl::time_modified.eq(now),
))
.execute_async(conn)
.await
.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump on these questions

@andrewjstone
Copy link
Contributor Author

@jgallagher I think this is ready for a re-review. I'll give it a test on a4x2 tomorrow as well.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly just some style and organization nits around planner vs builder.

@andrewjstone
Copy link
Contributor Author

Testing on a4x2 successful. I can expunge a disk and generate blueprints and watch it expunge and decommission a disk with as well as expunge zones on that disk. I also expunged a sled and watched it expunge all disks and zones and relocate services.

@andrewjstone andrewjstone enabled auto-merge (squash) February 22, 2025 20:25
@andrewjstone andrewjstone enabled auto-merge (squash) February 22, 2025 20:26
@andrewjstone andrewjstone merged commit 096fdaf into main Feb 22, 2025
17 checks passed
@andrewjstone andrewjstone deleted the expunge-and-decommission-disks-in-planner branch February 22, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants