Skip to content

fix(balance_serve): bind scheduler RPC to loopback to close pre-auth …#2043

Open
AAtomical wants to merge 1 commit into
kvcache-ai:mainfrom
AAtomical:fix/sched-rpc-loopback-bind
Open

fix(balance_serve): bind scheduler RPC to loopback to close pre-auth …#2043
AAtomical wants to merge 1 commit into
kvcache-ai:mainfrom
AAtomical:fix/sched-rpc-loopback-bind

Conversation

@AAtomical

Copy link
Copy Markdown

…pickle RCE

The balance_serve scheduler RPC (sched_rpc.py) binds its ZMQ ROUTER socket to tcp://*:{sched_port} and deserializes every received frame with pickle.loads. With no authentication, allowlist, or format validation, any peer that can reach the port can execute arbitrary code under the server process identity by sending a crafted pickle payload (GitHub issue #2042, Finding 1).

The scheduler RPC is local-only by design: it transports CUDA IPC tensor handles produced by mp.reductions.reduce_tensor (valid only on the same host), and SchedulerClient always connects to localhost. Binding to the loopback interface therefore removes the network attack surface without changing the wire protocol, eliminating the pre-auth remote RCE.

What does this PR do?

Fixes # (issue)

Before submitting

…pickle RCE

The balance_serve scheduler RPC (sched_rpc.py) binds its ZMQ ROUTER socket
to tcp://*:{sched_port} and deserializes every received frame with
pickle.loads. With no authentication, allowlist, or format validation, any
peer that can reach the port can execute arbitrary code under the server
process identity by sending a crafted pickle payload (GitHub issue kvcache-ai#2042,
Finding 1).

The scheduler RPC is local-only by design: it transports CUDA IPC tensor
handles produced by mp.reductions.reduce_tensor (valid only on the same
host), and SchedulerClient always connects to localhost. Binding to the
loopback interface therefore removes the network attack surface without
changing the wire protocol, eliminating the pre-auth remote RCE.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves security by binding the scheduler RPC server to the loopback interface (127.0.0.1) instead of all interfaces, mitigating potential remote code execution risks. The reviewer correctly noted that this change might cause connection issues if the client attempts to connect via 'localhost' (which may resolve to IPv6 '::1') and suggested updating the client to use '127.0.0.1' as well. Additionally, the reviewer identified another instance of this file in a different directory that requires the same security fix.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

# host, and SchedulerClient always connects to localhost, so this RPC is
# local-only by design. Bind to the loopback interface instead of all
# interfaces (tcp://*) so the pickle sink is never exposed to the network.
self.frontend.bind(f"tcp://127.0.0.1:{main_args.sched_port}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Binding the server to 127.0.0.1 restricts it to the IPv4 loopback interface. However, SchedulerClient (on line 169) connects to tcp://localhost:{sched_port}. On systems where localhost resolves to the IPv6 loopback address (::1) first, the client will fail to connect to the server. To ensure reliable local connectivity, please also update SchedulerClient to connect to 127.0.0.1 instead of localhost in this file. Additionally, please note that there is another identical scheduler RPC file at archive/kt-sft/ktransformers/server/balance_serve/sched_rpc.py which still binds to * and should be updated similarly to prevent the same security vulnerability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant