This release contains bugfixes, new features, major public-facing API
changes, *and* inter-component API changes.
Live-upgrading is possible but must be done carefully. Read the "Upgrade
path from v0.6.0" section at the end for more info.
Breaking API changes:
- Upgraded to Kubernetes 1.24 (#132)
- VMs may have fractional CPU values (#172)
Features:
- Improve scaling bounds validation (#190)
- Make api.ScalingBounds (for scaling annotations) public (#181)
- informant: Respect max file cache size (#182)
- agent: Add runner panics metrics (#180)
- agent: Rework (improve!) scaling algorithm (#195)
- In general, scaling should be much smoother now. There's still some
work to do in this area (particularly around downscaling), but
overall, a step that should be fairly impactful.
- agent->informant health checks (#203)
- Support for fractional CPU (#172)
- !!!
- NeonVM: Add current usage annotation to runner pod (#231)
- NeonVM: Allow disabling service links (#235)
Fixes:
- VirtualMachineSpec.PodResources now sets the pod's resources (#138)
- autoscaler-agents no longer produce logs about VM updates that aren't
on their node (#186)
- Fix NeonVM CRD still including VirtualMachineSpec.ServiceAccountName (#188)
- plugin: Fix Unreserve verdict format string in logs (#206)
- agent: Stop informant server when context canceled (#214)
- This was the cause of a pretty notable goroutine leak that should
now be fixed. See #196
- agent: Fix log for /unregister response (#224)
- agent: Fix inverted 'ErrServerClosed' check (#225)
- This may have been causing spurious error logs and silencing actual
errors.
- Add node affinity to NeonVM's kube-multus-ds DaemonSet (#236)
- agent: Fix deadlock on invalid plugin response (#237)
Protocol changes:
- agent->informant health checks are now supported, but not required (#203)
- NeonVM CRD now supports fractional CPU - all of min/use/max. (#172)
- NeonVM controller -> runner makes requests to /cpu_current and
/cpu_change endpoints to get/set fractional CPU via the runner's
cgroup manipulations. (#172)
- agent->plugin resource requests can now request fractional CPU (#172)
- plugin->agent permits can now return fractional CPU (#172)
- note: plugin does not return fractional CPU unless the agent
supports it. This makes it possible to do upgrades without
significant downtime. (#238)
Other changes:
- Upgraded to Go 1.20 (#130)
- agent/metrics: Make request error labels self-consistent (#193)
- Mark scheduler with `priorityClassName: system-cluster-critical` (#227)
Upgrade path from v0.6.0:
note: each step produces a "valid" state - the system will operate
successfully. It is not recommended to stay in a partial upgrade for
long, because they have not been tested as much.
1. Upgrade NeonVM controllers v0.6.0 -> v0.7.0
2. Upgrade autoscale-scheduler v0.6.0 -> v0.7.0
- note: it is ok to change to a compute unit with fractional CPU at
this step! Old autoscaler-agents will be given a multiplied CU so it
has an integer number of CPUs.
3. Upgrade autoscaler-agent v0.6.0 -> v0.7.0
note: Upgrading the vm-informant can be done at any point. Its
protocol changes are opt-in.