Skip to content

Commit 706ef4d

Browse files
committed
fix: correct GLM-5.1 specs in prod/dev yaml
Research via HF router (Together, Fireworks AI, zai-org providers) and model card confirms GLM-5.1 is 744B total / 40B active params (not 754B), with 202K context. Update both GLM-5.1 and GLM-5.1-FP8 descriptions to reflect accurate architecture. https://claude.ai/code/session_01AjLGLnaXowm91ymkX42wmN
1 parent 4e7ce75 commit 706ef4d

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

chart/env/dev.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ envVars:
7979
PUBLIC_LLM_ROUTER_ALIAS_ID: "omni"
8080
MODELS: >
8181
[
82-
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 32768 } },
83-
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 32768 } },
82+
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 744B MoE (40B active) with 202K context for agentic coding and reasoning.", "parameters": { "max_tokens": 32768 } },
83+
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 744B MoE for fastest-throughput agentic coding and reasoning.", "parameters": { "max_tokens": 32768 } },
8484
{ "id": "google/gemma-4-31B-it", "description": "Dense multimodal Gemma with 256K context, reasoning, and function calling." },
8585
{ "id": "google/gemma-4-26B-A4B-it", "description": "Efficient multimodal MoE Gemma with 4B active params and 256K context." },
8686
{ "id": "Qwen/Qwen3.5-9B", "description": "Dense multimodal hybrid with 262K context excelling at reasoning on-device." },

chart/env/prod.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,8 @@ envVars:
8989
PUBLIC_LLM_ROUTER_ALIAS_ID: "omni"
9090
MODELS: >
9191
[
92-
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 754B MoE for agentic coding, extended reasoning, and tool use.", "parameters": { "max_tokens": 32768 } },
93-
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 for efficient agentic coding and reasoning inference.", "parameters": { "max_tokens": 32768 } },
92+
{ "id": "zai-org/GLM-5.1", "description": "Upgraded 744B MoE (40B active) with 202K context for agentic coding and reasoning.", "parameters": { "max_tokens": 32768 } },
93+
{ "id": "zai-org/GLM-5.1-FP8", "description": "FP8 GLM-5.1 744B MoE for fastest-throughput agentic coding and reasoning.", "parameters": { "max_tokens": 32768 } },
9494
{ "id": "google/gemma-4-31B-it", "description": "Dense multimodal Gemma with 256K context, reasoning, and function calling." },
9595
{ "id": "google/gemma-4-26B-A4B-it", "description": "Efficient multimodal MoE Gemma with 4B active params and 256K context." },
9696
{ "id": "Qwen/Qwen3.5-9B", "description": "Dense multimodal hybrid with 262K context excelling at reasoning on-device." },

0 commit comments

Comments
 (0)