-
Notifications
You must be signed in to change notification settings - Fork 4k
asim: improve ability to specify store liveness and node membership #158455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/kv/kvserver/asim/state/impl.go line 1237 at r1 (raw file):
// number of nodes that exist in this state. // TODO(kvoli): Find a better home for this method, required by the storepool. func (s *state) NodeCountFn() storepool.NodeCountFunc {
This matches what the production code does, but I should add a comment here pointing at the production version. This method was implicated in #155734 - nodes were decommissioning and down, and were excluded in the production version of this code (b/c of decommissioning status) - same as here. There's nothing really wrong with this btw, the problem (in that issue) was that the allocator then down-replicated off live nodes, not realizing that there were down nodes around.
pkg/kv/kvserver/asim/state/liveness.go line 23 at r1 (raw file):
const ( // LivenessLive means the store is alive and healthy. LivenessLive LivenessState = iota
This mirrors the NodeLivenessStatus states that are truly liveness-related (LIVE/UNAVAILABLE/DEAD). We don't have a pure enum for this, which is why I gave asim its own. I toyed with just using a subset of NodeLivenessStatus, but it was pretty ugly and honestly, the less of NodeLivenessStatus we have here, the better.
We don't have an equivalent for UNKNOWN here but that's fine. Production code barely ever sees that, and it's not special cased meaningfully. Someone could add it later if they feel strongly, it wouldn't be too much of a lift but I feel somewhat strongly that it's not worth it.
pkg/kv/kvserver/asim/state/liveness.go line 68 at r1 (raw file):
// registerStore registers a new store with its node mapping and initializes it as live. func (st *StatusTracker) registerStore(storeID StoreID, nodeID NodeID) { st.storeStatusMap[storeID] = StoreStatus{Liveness: LivenessLive}
re: my earlier comment: if we introduced an "Unknown" liveness state here, we could use it here. But then every test would need to manually specify nodes as live before anything happens. Not worth it.
pkg/kv/kvserver/asim/state/liveness.go line 72 at r1 (raw file):
} func (st StatusTracker) GetNodeVitalityFromCache(roachpb.NodeID) livenesspb.NodeVitality {
We can probably narrow some interfaces to avoid having these panicking methods, but I didn't want to enter another tangent, so I'll leave this alone. It didn't look 100% trivial since these methods CAN end up getting called (by the constraints conformance reported), just not in the simulation setting - but the reporter code is production code.
…rain status When determining which stores/nodes are fit for being considered as sources and/or targets, we have to consider the two dimensions of liveness (is the process likely running?) and membership/operator intent (is this store decommissioning or draining?). The current asim model models health as a LivenessStatus, which is a single flat enum value. However, both the single-metric and multi-metric allocators in production can draw from a richer tapestry of signals. For example, cockroachdb#155734 sees the single-metric allocator make poor rebalancing decisions that hinge on the fact that it can consider a node "decommissioning" in one place but "unavailable" in another. We couldn't reproduce that particular issue in asim because the liveness status would "force" the single-metric allocator to consistently observe either of the two values - something that can be different in production code. Independently but relatedly, we are also working on adding liveness/status responsivity to the multi-metric allocator (cockroachdb#156776). We are careful to avoid some of the poor behaviors of the single-metric allocator, but similarly, we want to be able to exercise both decommissioning live and decommissioning dead nodes, not to mention that mma will reason about liveness at a store level. All of this suggested a reworking of how asim models liveness and operator intent. In fact, I ended up in this refactor after having implemented the health/status "plumbing" for the multi metric allocator and realizing that we'd need to make changes such as the ones presented here to meaningfully simulation test mma in the presence of interesting liveness/status combinations. The re-working ended up being fairly intense, but I really like where it gets us. We no longer model liveness using a LivenessStatus enum value. Instead, we have asim-attached types StoreStatus and NodeStatus that cleanly track liveness (for a store) and membership/draining status (which are node-level concepts today) and the DSL commands reflect this - we can set individual stores as live/dead, we can set nodes as decommissioning, et cetera. The single-metric allocator now receives signals that closely match what it would in production. It is now possible to specify a situation in which a store is decommissioning but also recently became non-live (which is what was required to simulate cockroachdb#155734). These signals originate in our clean new structs, from which we can (without bending over backwards) construct "NodeVitality" objects, which is what is required. As a reminder, the multi-metric allocator doesn't react to health signals at all yet, so it hasn't been hooked up to anything. This will be done in a separate PR that also adds logic to mma that puts these signals to use. Informs cockroachdb#156776. Epic: CRDB-55052
8b0f86f to
f34ddeb
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/asim/state/liveness.go line 45 at r2 (raw file):
type NodeStatus struct { Membership livenesspb.MembershipStatus Draining bool
Draining is an interesting case. We could also understand it as a liveness signal, since a draining node can necessarily be thought of as not dead (the transition is live -> draining -> unavailable -> dead).
Add health and store-level disposition checks to lease target filtering, and move all eligibility filtering (including the existing per-replica disposition check) to happen before computing load means. Computing means only over eligible targets is semantically correct: when asking "is store X above or below average?", we mean among stores that could actually receive the lease. Including ineligible stores (dead, unhealthy, refusing) distorts the average. Example: with eligible targets at 30/40/50 QPS and a dead store at 0 QPS: - Including dead store: mean=30, so 50 QPS looks "significantly above average" - Excluding dead store: mean=40, so 50 QPS is "slightly above average" The latter correctly reflects reality among viable candidates. Add a datadriven directive and testdata file to exercise the new retainReadyLeaseTargetStoresOnly function, covering health, store-level disposition, and per-replica disposition filtering. The functionality is also exercised through a newly introduced TestClusterState test. `asim` testing is going to be deferred to a different PR because it is contingent cockroachdb#158455 and an additional in-progress plumbing PR. Epic: CRDB-55052
Add health and store-level disposition checks to lease target filtering, and move all eligibility filtering (including the existing per-replica disposition check) to happen before computing load means. Computing means only over eligible targets is semantically correct: when asking "is store X above or below average?", we mean among stores that could actually receive the lease. Including ineligible stores (dead, unhealthy, refusing) distorts the average. Example: with eligible targets at 30/40/50 QPS and a dead store at 0 QPS: - Including dead store: mean=30, so 50 QPS looks "significantly above average" - Excluding dead store: mean=40, so 50 QPS is "slightly above average" The latter correctly reflects reality among viable candidates. Add a datadriven directive and testdata file to exercise the new retainReadyLeaseTargetStoresOnly function, covering health, store-level disposition, and per-replica disposition filtering. The functionality is also exercised through a newly introduced TestClusterState test. `asim` testing is going to be deferred to a different PR because it is contingent cockroachdb#158455 and an additional in-progress plumbing PR. Epic: CRDB-55052
When determining which stores/nodes are fit for being considered as sources and/or targets, we have to consider the two dimensions of liveness (is the process likely running?) and membership/operator intent (is this store decommissioning or draining?).
The current asim model models health as a LivenessStatus, which is a single flat enum value. However, both the single-metric and multi-metric allocators in production can draw from a richer tapestry of signals. For example, #155734 sees the single-metric allocator make poor rebalancing decisions that hinge on the fact that it can consider a node "decommissioning" in one place but "unavailable" in another. We couldn't reproduce that particular issue in asim because the liveness status would "force" the single-metric allocator to consistently observe either of the two values - something that can be different in production code.
Independently but relatedly, we are also working on adding liveness/status responsivity to the multi-metric allocator (#156776). We are careful to avoid some of the poor behaviors of the single-metric allocator, but similarly, we want to be able to exercise both decommissioning live and decommissioning dead nodes, not to mention that mma will reason about liveness at a store level.
All of this suggested a reworking of how asim models liveness and operator intent. In fact, I ended up in this refactor after having implemented the health/status "plumbing" for the multi metric allocator and realizing that we'd need to make changes such as the ones presented here to meaningfully simulation test mma in the presence of interesting liveness/status combinations.
The re-working ended up being fairly intense, but I really like where it gets us. We no longer model liveness using a LivenessStatus enum value. Instead, we have asim-attached types StoreStatus and NodeStatus that cleanly track liveness (for a store) and membership/draining status (which are node-level concepts today) and the DSL commands reflect this - we can set individual stores as live/dead, we can set nodes as decommissioning, et cetera.
The single-metric allocator now receives signals that closely match what it would in production. It is now possible to specify a situation in which a store is decommissioning but also recently became non-live (which is what was required to simulate #155734). These signals originate in our clean new structs, from which we can (without bending over backwards) construct "NodeVitality" objects, which is what is required.
As a reminder, the multi-metric allocator doesn't react to health signals at all yet, so it hasn't been hooked up to anything. This will be done in a separate PR that also adds logic to mma that puts these signals to use.
Informs #156776.
Epic: CRDB-55052