Skip to content

Commit 75e269a

Browse files
committed
chore: bump GLM-5.1 max_tokens to 49152
Raise max_tokens from 32768 to 49152 for zai-org/GLM-5.1 and zai-org/GLM-5.1-FP8 in prod and dev. Providers (Together, Fireworks AI, zai-org) advertise 202,752-token context on the HF router, so the larger output budget sits well within the context window. https://claude.ai/code/session_01AjLGLnaXowm91ymkX42wmN
1 parent 4e7ce75 commit 75e269a

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

chart/env/dev.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ envVars:
7979
PUBLIC_LLM_ROUTER_ALIAS_ID: "omni"
8080
MODELS: >
8181
[
82-
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 32768 } },
83-
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 32768 } },
82+
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 49152 } },
83+
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 49152 } },
8484
{ "id": "google/gemma-4-31B-it", "description": "Dense multimodal Gemma with 256K context, reasoning, and function calling." },
8585
{ "id": "google/gemma-4-26B-A4B-it", "description": "Efficient multimodal MoE Gemma with 4B active params and 256K context." },
8686
{ "id": "Qwen/Qwen3.5-9B", "description": "Dense multimodal hybrid with 262K context excelling at reasoning on-device." },

chart/env/prod.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,8 @@ envVars:
8989
PUBLIC_LLM_ROUTER_ALIAS_ID: "omni"
9090
MODELS: >
9191
[
92-
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 32768 } },
93-
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 32768 } },
92+
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 49152 } },
93+
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 49152 } },
9494
{ "id": "google/gemma-4-31B-it", "description": "Dense multimodal Gemma with 256K context, reasoning, and function calling." },
9595
{ "id": "google/gemma-4-26B-A4B-it", "description": "Efficient multimodal MoE Gemma with 4B active params and 256K context." },
9696
{ "id": "Qwen/Qwen3.5-9B", "description": "Dense multimodal hybrid with 262K context excelling at reasoning on-device." },

0 commit comments

Comments
 (0)