Skip to content

qemu: use stream netdev with reconnect for socketVMNet networking#5027

Open
loncharles wants to merge 1 commit into
lima-vm:masterfrom
loncharles:lattice/qemu-stream-reconnect
Open

qemu: use stream netdev with reconnect for socketVMNet networking#5027
loncharles wants to merge 1 commit into
lima-vm:masterfrom
loncharles:lattice/qemu-stream-reconnect

Conversation

@loncharles
Copy link
Copy Markdown

@loncharles loncharles commented May 23, 2026

Summary

Switch QEMU socketVMNet networking on macOS from legacy -netdev socket with a pre-dialed fd to -netdev stream with built-in reconnect support. Applies to all socketVMNet modes (bridged, shared, host).

  • Feature-detected: checks -netdev help output for stream support; falls back to socket if unavailable
  • Version-aware reconnect: reconnect-ms=500 for QEMU >= 9.2, reconnect=1 for QEMU 8.0–9.1 (the reconnect parameter was renamed to reconnect-ms in 9.2 and the old form removed in 10.2), no reconnect for QEMU 7.2–7.9
  • Wire-compatible: both stream and socket netdev use the same 4-byte big-endian length-prefixed Ethernet frame protocol that socket_vmnet uses

Motivation

When the socket_vmnet daemon restarts or the UNIX socket connection breaks, the legacy -netdev socket with a pre-dialed fd has no way to recover — the fd is dead and the VM's network is permanently broken until manual VM restart.

The stream netdev connects directly to the socket_vmnet UNIX socket and can automatically reconnect, recovering VM networking without restart.

Performance

In testing on macOS Intel (QEMU 11.0.0, HVF, socket_vmnet bridged to a 5GbE NIC), the stream netdev showed ~2x throughput improvement over legacy socket netdev:

socket (before) stream (after)
Throughput (VM → LAN host, 3 runs avg) ~970 Mbits/sec ~1,840 Mbits/sec
QEMU CPU during iperf3 ~219% ~216%
Ping latency (100 pings) 0.76ms avg 0.88ms avg

Host-to-host baseline on the same NIC is 4.66 Gbits/sec; the remaining gap is in the socket_vmnet relay path.

Compatibility

  • QEMU < 7.2: no change (falls back to existing socket netdev)
  • QEMU 7.2+: uses stream netdev (available since 7.2)
  • QEMU 8.0+: adds reconnect support (1-second granularity)
  • QEMU 9.2+: uses reconnect-ms (millisecond granularity, 500ms)
  • macOS only (socketVMNet code path is gated on runtime.GOOS == "darwin")
  • No change to usernet or raw socket networking paths

Test plan

  • Verified stream netdev starts and VM boots on QEMU 11.0.0
  • Verified socket_vmnet reconnect works (kill/restart socket_vmnet daemon, VM recovers in ~4s)
  • Verified no regression in normal operation (ping, docker, k3s, limactl shell)
  • Throughput benchmarked before/after (iperf3)
  • Soak testing on production host

@loncharles loncharles force-pushed the lattice/qemu-stream-reconnect branch from 9c8efcc to e332301 Compare May 23, 2026 03:48
@loncharles loncharles changed the title qemu: use stream netdev with reconnect for bridged networking qemu: use stream netdev with reconnect for socketVMNet networking May 23, 2026
@loncharles loncharles force-pushed the lattice/qemu-stream-reconnect branch from e332301 to 42132e8 Compare May 23, 2026 16:42
@loncharles
Copy link
Copy Markdown
Author

@unsuman @AkihiroSuda can I get a review? This provides socket vmnet reconnectivity where none existed before with the added benefit that it is 2x more performant for the same cpu usage and latency for free. Version aware with fallbacks..

I have a more involved, but related, fix for #3020 on VZ but want to make sure the effort isn't wasted.

Thanks

@AkihiroSuda
Copy link
Copy Markdown
Member

@loncharles
Copy link
Copy Markdown
Author

loncharles commented May 27, 2026

@AkihiroSuda, the VMNet and VZ tests fail identically on master (runs 26462579743 and 26384725917, same curl: (52) Empty reply from server). The failure looks pre-existing and unrelated to this change.

@AkihiroSuda
Copy link
Copy Markdown
Member

@loncharles
Copy link
Copy Markdown
Author

loncharles commented May 27, 2026

It's the exact same issue, friend.

VMNet tests (QEMU) passed on every master run up to and including 6d0fdd1 (May 22). It first failed on cfb9f31, the nerdctl 2.3.1 merge (PR #5024), and has failed on every master run since. Same commit that #5030 tracks for the VZ failure. Both tests show the same curl: (52) Empty reply from server symptom from the same containerd ttrpc regression you already bisected in #5049.

VZ

+ limactl shell default nerdctl run -d --name nginx -p 127.0.0.1:8080:80 ghcr.io/stargz-containers/nginx:1.19-alpine-org
time="2026-05-22T14:04:22Z" level=warning msg="treating lima version \"15f5ce2\" from \"/Users/runner/.lima/default/lima-version\" as very latest release"
8f9b5014bb9e5eb02b1b2d8e05950af4d332dba07c61769450bff7079160233f
+ timeout 3m bash -euxc 'until curl -f --retry 30 --retry-connrefused http://127.0.0.1:8080; do sleep 3; done'
+ curl -f --retry 30 --retry-connrefused http://127.0.0.1:8080/
  % Total    % Received % Xferd  Average Speed  Time    Time    Time   Current
                                 Dload  Upload  Total   Spent   Left   Speed

  0      0   0      0   0      0      0      0                              0
  0      0   0      0   0      0      0      0           00:01              0
  0      0   0      0   0      0      0      0           00:02              0
  0      0   0      0   0      0      0      0           00:03              0

VMNet

+ limactl shell default nerdctl run -d --name nginx -p 127.0.0.1:8080:80 ghcr.io/stargz-containers/nginx:1.19-alpine-org
  time="2026-05-25T00:21:14Z" level=warning msg="treating lima version \"96137d9\" from \"/Users/runner/.lima/default/lima-version\" as very latest release"
  7d5b8747951f48babf4c10c9154c9adce5b8d72e914dd942d7b7f7f3aa3d84fc
  + timeout 3m bash -euxc 'until curl -f --retry 30 --retry-connrefused http://127.0.0.1:8080;/ do sleep 3; done'
  + curl -f --retry 30 --retry-connrefused http://127.0.0.1:8080/
    % Total    % Received % Xferd  Average Speed  Time    Time    Time   Current
                                   Dload  Upload  Total   Spent   Left   Speed
  
    0      0   0      0   0      0      0      0                              0
    0      0   0      0   0      0      0      0           00:01              0
    0      0   0      0   0      0      0      0           00:02              0
    0      0   0      0   0      0      0      0           00:03              0
    0      0   0      0   0      0      0      0           00:04              0
    0      0   0      0   0      0      0      0           00:05              0
    0      0   0      0   0      0      0      0           00:06              0
    0      0   0      0   0      0      0      0           00:07              0
    ```

@AkihiroSuda
Copy link
Copy Markdown
Member

Could you try rebasing with the current master?

@AkihiroSuda AkihiroSuda added this to the v2.2.0 milestone Jun 1, 2026
Use -netdev stream instead of legacy -netdev socket for socketVMNet
networking on macOS Intel (QEMU backend), with version-appropriate
reconnect. Applies to all socketVMNet modes (bridged, shared, host).

  QEMU >= 9.2: stream with reconnect-ms=500
  QEMU 8.0-9.1: stream with reconnect=1 (seconds granularity)
  QEMU 7.2-7.9: stream without reconnect
  QEMU < 7.2: fall back to socket (no stream support)

The stream backend connects directly to the socket_vmnet UNIX socket,
eliminating the pre-dialed fd approach. When reconnect is available,
QEMU automatically re-establishes the connection after link failures,
recovering VM networking without manual restart.

In testing, stream netdev showed ~2x throughput improvement over
legacy socket netdev (~1.84 Gbits/sec vs ~970 Mbits/sec).

Feature-detected via -netdev help output; version checked for the
reconnect parameter which was renamed from reconnect (8.0) to
reconnect-ms (9.2) and the old form removed in 10.2.

Signed-off-by: Lon C. Lundgren <lon@ocelot.net>
@loncharles loncharles force-pushed the lattice/qemu-stream-reconnect branch from 42132e8 to eb4f0f0 Compare June 2, 2026 02:53
@AkihiroSuda AkihiroSuda requested a review from nirs June 2, 2026 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants