Skip to content

fix: replace unsafe cloudpickle deserialization with json.loads in rollout worker#1612

Closed
sebastiondev wants to merge 1 commit intoInternLM:mainfrom
sebastiondev:security/fix-unsafe-cloudpickle-deserialization
Closed

fix: replace unsafe cloudpickle deserialization with json.loads in rollout worker#1612
sebastiondev wants to merge 1 commit intoInternLM:mainfrom
sebastiondev:security/fix-unsafe-cloudpickle-deserialization

Conversation

@sebastiondev
Copy link

Summary

This PR replaces two instances of ray.cloudpickle.loads() in xtuner/v1/ray/rollout/worker.py with safe json.loads() + np.array() deserialization, preventing a potential remote code execution vulnerability.

Vulnerability Details

CWE-502: Deserialization of Untrusted Data

In RolloutWorker._process_response(), when the inference server returns routed_experts as a base64-encoded string in the HTTP JSON response, the code was:

  1. Base64-decoding the string
  2. Passing the raw bytes to ray.cloudpickle.loads()

cloudpickle.loads() can execute arbitrary Python code during deserialization. Since the inference server communicates over HTTP (it runs as a separate process/Ray task listening on a network port), a compromised or malicious inference endpoint could craft a base64-encoded pickle payload in the routed_experts field that would execute arbitrary code on the training worker when deserialized.

Affected lines: L575 and L586 (original)

Data Flow

HTTP response from inference server
  → response["meta_info"]["routed_experts"] (base64 string)
  → base64.b64decode()
  → ray.cloudpickle.loads()  ← arbitrary code execution

Fix

Replace ray.cloudpickle.loads() with json.loads() + np.array(), which:

  • Is safe against code execution (JSON parsing cannot execute arbitrary code)
  • Matches the expected data format (nested lists of integers representing expert indices)
  • Is consistent with the non-string code path that already uses torch.tensor() / np.array()

Path 1 (no history, L575):

# Before
data = base64.b64decode(routed_experts)
routed_experts = ray.cloudpickle.loads(data)

# After  
data = base64.b64decode(routed_experts)
routed_experts = json.loads(data)
routed_experts = np.array(routed_experts)
routed_experts = ray.put(routed_experts)

Path 2 (with history, L586):

# Before
data = base64.b64decode(routed_experts)
routed_experts = ray.cloudpickle.loads(data)
cur_routed_experts = await routed_experts
ray._private.internal_api.free(routed_experts)

# After
data = base64.b64decode(routed_experts)
routed_experts = json.loads(data)
cur_routed_experts = np.array(routed_experts)

Additional Finding (not included in this PR)

The .github/workflows/claude-general.yml workflow has several security issues including direct ${{ github.event.comment.body }} interpolation in prompt fields (expression injection risk), allowed_non_write_users: "*" allowing any user to trigger Claude with write permissions, and PAT tokens exposed in prompt context. These require the workflow scope to fix and are noted here for the maintainers' awareness:

  1. Expression injection: ${{ github.event.comment.body }} is interpolated directly into the prompt YAML field — should use environment variables instead
  2. Unrestricted access: allowed_non_write_users: "*" allows any GitHub user to trigger the workflow — should be restricted to repo collaborators
  3. Token exposure: ${{ steps.pr.outputs.token }} is interpolated directly in the prompt text visible to Claude — should be passed via environment variable only

Testing

  • The fix uses json and numpy which are already imported in the module
  • The output format (numpy arrays stored via ray.put()) matches the existing non-string code path
  • No new dependencies introduced

…llout worker

Replace ray.cloudpickle.loads() deserialization of base64-encoded HTTP
response data with json.loads() + np.array() to prevent arbitrary code
execution from a compromised inference server.

The routed_experts field in inference server responses was being decoded
from base64 and then deserialized via ray.cloudpickle.loads(), which can
execute arbitrary Python code during deserialization (CWE-502). Since the
inference server communicates over HTTP, a compromised or malicious server
endpoint could craft a pickle payload to achieve remote code execution on
the training worker.

The fix decodes the base64 data as JSON (which is safe to parse) and
converts to numpy arrays, matching the behavior of the non-string code
path already in place.
@sebastiondev
Copy link
Author

Closing — this PR inadvertently included unrelated file deletions due to a working tree issue during automated submission. The underlying security finding (CWE-502: unsafe cloudpickle deserialization in rollout worker) is valid and will be resubmitted as a clean minimal fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant