Skip to content

bug(vz/launchdaemon): VZ driver leaves VM orphaned on shutdown; limactl start fails to recover broken state #5087

@resker

Description

@resker

Summary

Two related bugs surface when running a Lima instance as a LaunchDaemon (PR #4984) on macOS. Together they cause the instance to get stuck in a Broken state after every system reboot, requiring manual intervention.

Bug 1: VZ driver does not shut down cleanly on SIGTERM

When launchd sends SIGTERM to limactl start <instance> --foreground at system shutdown, Lima attempts to stop the VZ VM via LimaVzDriver.Stop(). On affected machines, l.machine.CanRequestStop() returns false, so Stop() returns errors.New("vz: CanRequestStop is not supported") and exits fatally without stopping the VM.

The VZ driver process (started by the system-level LaunchDaemon) is left running after the host agent exits. On next boot the instance is in state "vz driver is running but host agent is not".

Relevant code: pkg/driver/vz/vz_driver_darwin.go:527

{"level":"fatal","msg":"vz: CanRequestStop is not supported"}

Observed on: Apple M4 Mac Mini, macOS 26.5, Lima 2.1.1 + dev build of #4984.

CanRequestStop returns false when the VM was not configured with RequestStopHandler set — this appears to be the case for all VZ instances created by Lima today.

Fix direction: set a RequestStopHandler on the VZ machine during creation so CanRequestStop() returns true. Alternatively, when CanRequestStop is false, fall back to requestStopViaSSH directly (graceful) or force-kill the VZ process (hard stop).


Bug 2: limactl start does not recover from broken state

When the instance is in StatusBroken with error "vz driver is running but host agent is not", startAction in cmd/limactl/start.go returns immediately with:

errors inspecting instance: [vz driver is running but host agent is not]

With KeepAlive=true in the LaunchDaemon plist, launchd retries every 10 seconds — but each retry hits the same broken-state check and exits, looping indefinitely. The instance never recovers without manual limactl stop --force.

Relevant code: cmd/limactl/start.go:560,580

Fix direction: in startAction, when the instance is StatusBroken with the specific error "<vmtype> driver is running but host agent is not", automatically attempt limactl stop --force to clean up the orphaned driver state before proceeding with the start. This turns a permanent crash-loop into a self-healing recovery.


Reproduction

  1. Install Lima instance as a LaunchDaemon via limactl autostart enable --condition=boot <instance> (PR feat(autostart): add LaunchDaemon support for headless macOS servers #4984)
  2. Reboot the system
  3. Observe: limactl list shows STATUS=Broken, launchd.stderr.log shows crash-loop

Workaround

limactl stop --force <instance>
# launchd retries start automatically within KeepAlive interval

Relationship

Bug 1 causes Bug 2. Fixing Bug 1 (clean VZ shutdown on SIGTERM) prevents the broken state from occurring. Fixing Bug 2 (self-healing start) provides defense-in-depth for any case where the VZ driver is orphaned (crash, OOM kill, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions