Releases: neondatabase/autoscaling
Releases · neondatabase/autoscaling
v0.8.0
This release contains bugfixes, a new component, minor public-facing API changes, and significant changes to the deployed services, but no inter-component API changes. Breaking API changes: - NeonVM: restart policy no longer applies directly to the pod (#293) Features: - Add patch for cluster-autoscaler compatability with VMs (#232) - NeonVM: implement RestartPolicy (#293) - NeonVM security and networking redesign (#245) - Runner pod no longer has Privileged: true - QEMU in the runner pod runs under its own user - Adapted generic-device-plugin for NeonVM, to give access to /dev/kvm and /dev/vhost-* - Switch from neonvm-vxlan-ipam to Whereabouts CNI -> Allows using overlay IP addresses in normal pods as well as VMs - Reconcile cycles improved - NeonVM/vm-builder: Add --enable-file-cache flag (default: off) (#265) - NeonVM: user RBAC roles (#284): - neonvm-virtualmachine-viewer-role - neonvm-virtualmachine-editor-role - neonvm-virtualmachinemigration-viewer-role - neonvm-virtualmachinemigration-editor-role - More logs for autoscaler-agent (#290, #291) - More autoscaler-agent metrics: - autoscaling_agent_runner_starts (#273) - autoscaling_agent_runner_restarts (#273) - autoscaling_agent_runner_fatal_errors_total (#274) - autoscaling_errored_vm_runners_current (#274) Fixes: - NeonVM/vm-builder: Fix command passthrough (#263) - NeonVM/vm-builder: Fix cgexec being ignored (#281) - NeonVM/vm-builder: Build without cgo (#255) - This removes the dependency on a dynamically loaded libc. - informant: Fix cgroup memory.high throttling (#223) - agent: Various logs fixes (#242, #267, #271, #272) - agent: Restart panicked/errored runners (#273) - agent/billing: Don't count VMs that aren't runnnig (#278) - agent, sched: Add ports to pod spec for metrics (#282) - agent, sched: Fix logging of MilliCPU (#261) - sched: Don't output command help on error (#253) - plugin: Handle completed pods as if deleted (#260) No protocol changes. Other changes: - Many unused RBAC (and other) items removed: - Namespace autoscaler-config (#245) - ClusterRole vm-view (#284) - ClusterRole vm-patcher (#284) - ClusterRoleBinding kube-system/autoscaler-vm-view (#284) - ClusterRoleBinding kube-system/autoscale-scheduler-as-vm-patcher (#284) - Role kube-system/autoscale-scheduler-config-reader (#284) - RoleBinding kube-system/autoscale-scheduler-config-reader (#284) - NeonVM: Rename 'runner' container to 'neonvm-runner' (#277) - agent: Network error metrics include root cause (#287) Upgrade path from v0.7.2: - No ordering requirements. - You may wish to remove old items as mentioned above.
v0.7.3-alpha3
This is a pre-release just for building and distributing images. Do not deploy anything from this release.
v0.7.2
This is a hotfix release that reverts a change in behavior from v0.7.0: Alongside the change to allow fractional CPU, #172 changed the billing value type to a float. This was incorrect, fixed by #244.
v0.7.1
This is a hotfix release that fixes a bug with v0.7.0: On Kubernetes nodes with cgroups v1, the NeonVM runner was failing to read cgroup CPU information due to a bad path. This, in turn, prevented any successful reconciling for VMs on these nodes, which - among other things - prevented autoscaling from functioning for these VMs.
v0.7.0
This release contains bugfixes, new features, major public-facing API changes, *and* inter-component API changes. Live-upgrading is possible but must be done carefully. Read the "Upgrade path from v0.6.0" section at the end for more info. Breaking API changes: - Upgraded to Kubernetes 1.24 (#132) - VMs may have fractional CPU values (#172) Features: - Improve scaling bounds validation (#190) - Make api.ScalingBounds (for scaling annotations) public (#181) - informant: Respect max file cache size (#182) - agent: Add runner panics metrics (#180) - agent: Rework (improve!) scaling algorithm (#195) - In general, scaling should be much smoother now. There's still some work to do in this area (particularly around downscaling), but overall, a step that should be fairly impactful. - agent->informant health checks (#203) - Support for fractional CPU (#172) - !!! - NeonVM: Add current usage annotation to runner pod (#231) - NeonVM: Allow disabling service links (#235) Fixes: - VirtualMachineSpec.PodResources now sets the pod's resources (#138) - autoscaler-agents no longer produce logs about VM updates that aren't on their node (#186) - Fix NeonVM CRD still including VirtualMachineSpec.ServiceAccountName (#188) - plugin: Fix Unreserve verdict format string in logs (#206) - agent: Stop informant server when context canceled (#214) - This was the cause of a pretty notable goroutine leak that should now be fixed. See #196 - agent: Fix log for /unregister response (#224) - agent: Fix inverted 'ErrServerClosed' check (#225) - This may have been causing spurious error logs and silencing actual errors. - Add node affinity to NeonVM's kube-multus-ds DaemonSet (#236) - agent: Fix deadlock on invalid plugin response (#237) Protocol changes: - agent->informant health checks are now supported, but not required (#203) - NeonVM CRD now supports fractional CPU - all of min/use/max. (#172) - NeonVM controller -> runner makes requests to /cpu_current and /cpu_change endpoints to get/set fractional CPU via the runner's cgroup manipulations. (#172) - agent->plugin resource requests can now request fractional CPU (#172) - plugin->agent permits can now return fractional CPU (#172) - note: plugin does not return fractional CPU unless the agent supports it. This makes it possible to do upgrades without significant downtime. (#238) Other changes: - Upgraded to Go 1.20 (#130) - agent/metrics: Make request error labels self-consistent (#193) - Mark scheduler with `priorityClassName: system-cluster-critical` (#227) Upgrade path from v0.6.0: note: each step produces a "valid" state - the system will operate successfully. It is not recommended to stay in a partial upgrade for long, because they have not been tested as much. 1. Upgrade NeonVM controllers v0.6.0 -> v0.7.0 2. Upgrade autoscale-scheduler v0.6.0 -> v0.7.0 - note: it is ok to change to a compute unit with fractional CPU at this step! Old autoscaler-agents will be given a multiplied CU so it has an integer number of CPUs. 3. Upgrade autoscaler-agent v0.6.0 -> v0.7.0 note: Upgrading the vm-informant can be done at any point. Its protocol changes are opt-in.
v0.6.0
This release contains bugfixes, new features, and minor public-facing API changes, but no inter-component API changes. Breaking API changes: - NeonVM: Removed VirtualMachineSpec.ServiceAccountName (#140) - NeonVM: Make vm-builder specific to Neon, with new vm-builder-generic for general-purpose use. vm-builder-generic is *almost* the same as the previous vm-builder, but it does not include vector by default (#133) - Require label "autoscaling.neon.tech/enabled=true" for autoscaling to be enabled (#38) Features: - Allow annotation "autoscaling.neon.tech/bounds=..." to override scaling bounds (#128) - NeonVM: add --quiet flag to vm-builder[-generic], which is off by default. Builds are more verbose without it. (#169) - agent, plugin: Add prometheus metrics (#92, #174, #175) - agent: Better config validation (#177) Fixes: - agent: always log informant register errors (#165) - agent: fix runner log prefix (#159) - NeonVM: fix ENTRYPOINT, CMD handling when there's mutiple strings (#184) No protocol changes. Upgrade path from v0.5.2: - No ordering requirements.
v0.5.2
This release incorporates a handful of bugfixes and some new features. It is entirely inter-compatible with v0.5.1, with the exception of a minor change in the scheduler's "dump state" output. Features: - agent, plugin: Reimplement migration under load. (#112) - Note: The overlay network that allows VMs to preserve their IP addresses is not currently functional. Fixes: - plugin: Don't reject resource requests that aren't a multiple of the compute unit if the VM's resources are constrained to make satisfying that requirement impossible. (#108) - plugin: Fix missing JSON tags for Buffer and CapacityPressure in podResourceState. (#107) - Note: this changes the "dump state" JSON output - agent: Don't return from /suspend until NeonVM requests finished. This helps avoid possibilities of multiple autoscaler-agents acting at the same time. - agent/billing: panic if VM store unexpectedly stopped (#110) No protocol changes. Upgrade path from v0.5.1: - No ordering requirements.
v0.5.1
Hotfix release, fixes a panic on autoscaler-agent dump-state requests.
v0.5.0
This release marks the first release where NeonVM has been merged into the same repository. It was last at v0.4.6, so we've bumped to v0.5.0 as a kind of clean slate. Features: - Added "dump state" endpoints to autoscaler-agent and scheduler plugin. Refer to https://github.com/neondatabase/autoscaling/pull/76 for more information. (The endpoints are enabled by default). No protocol changes. There have been significant changes to testing - everything is run by the Makefile now. Refer to https://github.com/neondatabase/autoscaling/pull/91 for more information. Upgrade path from v0.1.17 / v0.4.6: - No ordering requirements.
v0.1.17
No new features. Fixes: - agent/billing: consumption event duplication fixed (#94) No protocol changes. Upgrade path from v0.1.16: - No ordering requirements.