fix: replace unsafe cloudpickle deserialization with json.loads in rollout worker by sebastiondev · Pull Request #1612 · InternLM/xtuner

sebastiondev · 2026-03-21T23:32:27Z

Summary

This PR replaces two instances of ray.cloudpickle.loads() in xtuner/v1/ray/rollout/worker.py with safe json.loads() + np.array() deserialization, preventing a potential remote code execution vulnerability.

Vulnerability Details

CWE-502: Deserialization of Untrusted Data

In RolloutWorker._process_response(), when the inference server returns routed_experts as a base64-encoded string in the HTTP JSON response, the code was:

Base64-decoding the string
Passing the raw bytes to ray.cloudpickle.loads()

cloudpickle.loads() can execute arbitrary Python code during deserialization. Since the inference server communicates over HTTP (it runs as a separate process/Ray task listening on a network port), a compromised or malicious inference endpoint could craft a base64-encoded pickle payload in the routed_experts field that would execute arbitrary code on the training worker when deserialized.

Affected lines: L575 and L586 (original)

Data Flow

HTTP response from inference server
  → response["meta_info"]["routed_experts"] (base64 string)
  → base64.b64decode()
  → ray.cloudpickle.loads()  ← arbitrary code execution

Fix

Replace ray.cloudpickle.loads() with json.loads() + np.array(), which:

Is safe against code execution (JSON parsing cannot execute arbitrary code)
Matches the expected data format (nested lists of integers representing expert indices)
Is consistent with the non-string code path that already uses torch.tensor() / np.array()

Path 1 (no history, L575):

# Before
data = base64.b64decode(routed_experts)
routed_experts = ray.cloudpickle.loads(data)

# After  
data = base64.b64decode(routed_experts)
routed_experts = json.loads(data)
routed_experts = np.array(routed_experts)
routed_experts = ray.put(routed_experts)

Path 2 (with history, L586):

# Before
data = base64.b64decode(routed_experts)
routed_experts = ray.cloudpickle.loads(data)
cur_routed_experts = await routed_experts
ray._private.internal_api.free(routed_experts)

# After
data = base64.b64decode(routed_experts)
routed_experts = json.loads(data)
cur_routed_experts = np.array(routed_experts)

Additional Finding (not included in this PR)

The .github/workflows/claude-general.yml workflow has several security issues including direct ${{ github.event.comment.body }} interpolation in prompt fields (expression injection risk), allowed_non_write_users: "*" allowing any user to trigger Claude with write permissions, and PAT tokens exposed in prompt context. These require the workflow scope to fix and are noted here for the maintainers' awareness:

Expression injection: ${{ github.event.comment.body }} is interpolated directly into the prompt YAML field — should use environment variables instead
Unrestricted access: allowed_non_write_users: "*" allows any GitHub user to trigger the workflow — should be restricted to repo collaborators
Token exposure: ${{ steps.pr.outputs.token }} is interpolated directly in the prompt text visible to Claude — should be passed via environment variable only

Testing

The fix uses json and numpy which are already imported in the module
The output format (numpy arrays stored via ray.put()) matches the existing non-string code path
No new dependencies introduced

…llout worker Replace ray.cloudpickle.loads() deserialization of base64-encoded HTTP response data with json.loads() + np.array() to prevent arbitrary code execution from a compromised inference server. The routed_experts field in inference server responses was being decoded from base64 and then deserialized via ray.cloudpickle.loads(), which can execute arbitrary Python code during deserialization (CWE-502). Since the inference server communicates over HTTP, a compromised or malicious server endpoint could craft a pickle payload to achieve remote code execution on the training worker. The fix decodes the base64 data as JSON (which is safe to parse) and converts to numpy arrays, matching the behavior of the non-string code path already in place.

sebastiondev · 2026-03-21T23:56:42Z

Closing — this PR inadvertently included unrelated file deletions due to a working tree issue during automated submission. The underlying security finding (CWE-502: unsafe cloudpickle deserialization in rollout worker) is valid and will be resubmitted as a clean minimal fix.

sebastiondev closed this Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace unsafe cloudpickle deserialization with json.loads in rollout worker#1612

fix: replace unsafe cloudpickle deserialization with json.loads in rollout worker#1612
sebastiondev wants to merge 1 commit intoInternLM:mainfrom
sebastiondev:security/fix-unsafe-cloudpickle-deserialization

sebastiondev commented Mar 21, 2026

Uh oh!

sebastiondev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sebastiondev commented Mar 21, 2026

Summary

Vulnerability Details

Data Flow

Fix

Path 1 (no history, L575):

Path 2 (with history, L586):

Additional Finding (not included in this PR)

Testing

Uh oh!

sebastiondev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant