Releases: neondatabase/autoscaling
Releases · neondatabase/autoscaling
v0.13.0
This relatively small release contains significant changes to existing behavior in both the autoscaler-agent and scheduler plugin. No breaking API changes (technically). Features: - agent: Memory-based scaling (#393) - Currently implemented in a similar manner to our load average-based scaling, via total memory usage, including the kernel. - plugin: Allow ignoring resource usage from namespaces (#399) - Carveout for 'overprovisioning' pods now that we're tracking everything. Fixes: - plugin: Improve plugin method logs (#405) - Previously, some notable metrics were being increased without suitable accompanying log messages. No protocol changes. Other changes: - plugin: Track all pods (#399) - Should make our accounting & metrics reporting much more accurate. - plugin: Remove 'System' reserved resources (#399) - No longer necessary, because we're tracking everything.
v0.12.2
Small release, just containing #395 - a fix for #234, where the autoscaler-agent's per-VM Runner will panic when the scaling bounds decrease below the current usage. This was fast-tracked for release because of the impact on VM pools. It's not hard-blocking, but is significant enough that it's worth fixing beforehand.
v0.12.1
This release contains bugfixes and new metrics (along with some changes to existing ones). No breaking API changes. Features: - plugin: New migration-related metrics (#387): - autoscaling_plugin_migrations_created_total - autoscaling_plugin_migrations_deleted_total - autoscaling_plugin_migration_create_fails_total - autoscaling_plugin_migration_delete_fails_total - plugin: Include node group in node resource metrics (#382) - agent: agent->informant request metrics now include the endpoint (#380) Fixes: - Add vmscrape.yaml to release assets (#392) - plugin: Fix spurious "updated scaling bounds" logs (#391) - Incidentally, this *also* entirely fixes our handling of scaling bounds changes. - plugin: Migration handling reliability improvements (#387) - informant: Fix parent process stall when child dies quickly (#389) - agent: Fix NeonVM downscaling not showing up in metrics (#381) No protocol changes. No other changes. Upgrade path from v0.12.0: - No ordering requirements.
v0.12.0
This release contains bugfixes (lots of them!), new metrics, and BREAKING CHANGES TO OLD METRICS. No breaking API changes. Features: - neonvm: Propagate label/annotation changes to runner pod(s) (#279) - agent: Add scaling metrics! (#334) - All of: - autoscaling_agent_scheduler_plugin_{requested,approved}_{cpu,mem}_change_total - autoscaling_agent_informant_{requested,approved}_{cpu,mem}_change_total - autoscaling_agent_neonvm_requested_{cpu,mem}_change_total - autoscaling_agent_neonvm_outbound_requests_total - plugin: Add per-node resource metrics (#363) - Two new metrics: - autoscaling_plugin_node_cpu_resources_current - autoscaling_plugin_node_mem_resources_current Fixes: - Add whereabouts.yaml to release assets (#348) - neonvm: Don't propagate kubectl's last-applied-configuration annotation (#344) - agent: Reset Runner endState on restart (#349) - This bug caused the agent's metrics to never show a previously-panicked Runner as recovered, even when it was. - agent/schedwatch: Fix spurious close (#352) - This bug was causing agents to be unable to recognize new schedulers. - plugin/watch: Remove redundant error wrapping (#358) - plugin: Fix filter cycle metrics (#356) - This REMOVES two metrics: - autoscaling_plugin_filter_cycle_successes_total - autoscaling_plugin_filter_cycle_rejections_total - See the PR for more details. - README: fix make commands to reflect kind/k3d (#365) - plugin: Cleanup state for deleted k8s Nodes (#361) - Should *hopefully* fix a particular memory leak, but it's not clear. - informant/filecache: Close DB connections (#367) - This was causing some users to be unable to connect to their database because the informant took all the connections. - This was already released as v0.11.1 - agent/billing: Move push logic into separate thread (#368) - This was preventing us from having more reasonable request timeouts (like... anything above 2s) No protocol changes. Other changes: - util/watch: More logs! (#351) - agent: Record neon/endpoint-id for each Runner if/when assigned (#353) - agent: Improve help message for autoscaling_agent_tracked_vms_current (#354) - agent/billing: Log IdempotencyKey of events (#366) - billing: Add x-trace-id header to requests (#372) Upgrade path from v0.11.0: - No ordering requirements, but considering the fixes to the agent's scheduler detection, it's probably worthwhile to update any agents first.
v0.11.1
Hotfix release, backporting #367 to fix a bug in the informant that caused it to never close DB connections when the file cache integration is enabled.
v0.11.0
This release contains bugfixes, new features, and large changes to the NeonVM controller. Breaking API changes: - neonvm: VirtualMachine .spec.extraNetwork fields changed (#256) - Removed multusNetworkNoIP - Made multusNetwork omitempty - neonvm: VirtualMachineMigrations no longer have post-copy enabled by default (#256) Features: - neonvm: Two new VmPhase types: "PreMigrating" and "Scaling" (#256) - neonvm: Migration source runner pod now has an ownerref pointing back to the migration (#332) - ci: Added support for k3d (#340) - plugin: new metrics - autoscaling_plugin_filter_cycle_successes_total (#346) - autoscaling_plugin_filter_cycle_rejections_total (#346) - autoscaling_plugin_extension_call_fails_total (#347) Fixes: - scheduler: Fixed agent-handler log keys explosion (#338) - NB: this was already released as v0.10.1 - scheduler: Fixed missing `continue` when skipping completed pods (#342) - NB: this was already released as v0.10.2 - scheduler: Fixed outdated log line (#343) - Removed "[autoscale-enforcer] load state: " prefix from the message - agent: Do informant health checks even when suspended (#341) No protocol changes. Other changes: - ci: kind and kubectl versions tweaked (#336) - k8s deps upgraded to 1.25.11 (#339) - plugin: Capitalize pluginCalls metric labels (#345) There's even more changes to the NeonVM controller that aren't listed here. For more, see #256. Upgrade path from v0.10.x: - No ordering requirements.
v0.10.2
Hotfix release, backporting #342 to fix the scheduler plugin's handling of completed pods on startup.
v0.10.1
Hotfix release, backporting #338 to fix scheduler plugin logs for agent requests.
v0.10.0
This release contains bugfixes, ???, and a breaking change to the agent<->informant protocol. Breaking API changes: - agent<->informant: Include AgentID in informant /downscale and /upscale (#316) - This bumps the agent<->informant protocol to v2. - The agent currently supports both versions, and will for the immediate future. Features: - neonvm/builder: Make output prettier (#280) - Start switch from klog -> zap [agent/plugin/informant] (#323) - All kinds of dashboards need updating. It's for the best. Fixes: - agent/informant: Fix inverted condition for logs (#315) - plugin: Handle usage updates for non-autoscaling VMs (#312) - plugin: Fix Unreserve condition (#317) - util/watch: Set failingCurrent gauge to zero so it shows up (#320) - neonvm: Fix default ports from Go client (#257) Protocol changes: - See above, re: informant agent<->informant changes. Other changes: - deploy: Change metrics scrape interval 10s -> 60s (#321) - neonvm/runner: Set AutomountServiceAccountToken = false (#298) - agent/billing: Use NeonVM .status.cpus, not .spec.guest.cpus.use (#325) Upgrade path from v0.9.0: - All autoscaler-agents must be upgraded before any vm-informants - No other requirements.
v0.9.0
This release contains bugfixes and upgrades to Kubernetes 1.25. Breaking API changes: - Upgrading to K8s 1.25. NB: Autoscaling requires K8s control planes with a version equal or +1; i.e. K8s 1.25 OR 1.26 is not required. Features: - New metrics! (#306, #310) - Too many to cover here; refer to those PRs intead. Fixes: - util/watch: Fix race condition on k8s watch.Update events (#295) - agent/informant: Fix informant server exit logs (#286) - api: Fix ExtractVmInfo disallowing min > use or use > max (#303) - this one may be counterintuitive at first. See #249 for context - agent: Fix vmEvent formatting (#307) - informant: Suspend old agent *before* new one (#308) - util/watch: Fix racy behavior with InitModeDefer (#305) - This was causing billing events to not be generated for VMs until an event *after* startup occurs for them. - plugin: Allow overcommitted nodes on startup (#313) - agent: Stop SchedulerWatch when Runner finishes (#314) - This was preventing the switchover to a new scheduler on upgrade or restart Other changes: - Fix yaml formatting for autoscaler-agent config deploy (#300) No protocol changes. Upgrade path from v0.8.0: - No ordering requirements.