Skip to content

Commit 7e7dd6d

Browse files
Feng-xiaosuozjks98
authored andcommitted
adapt to minimax_m2 (vllm-project#5624)
### What this PR does / why we need it? This PR fixes Minimax model loading in vLLM Ascend backend by: Adding model type check for "minimax" and "minimax_m2" to replace "mlp" prefix with "block_sparse_moe" Implementing special handling for Minimax expert layer naming conventions Adding Minimax configuration to packed_modules_model_mapping for proper qkv_proj and experts module handling Without these changes, Minimax models fail to load on Ascend devices due to incompatible layer naming and module packing. ### Does this PR introduce _any_ user-facing change? Yes. Users can now successfully load and run Minimax models on Ascend hardware with vLLM. This enables inference capabilities for this model family on Ascend devices. ### How was this patch tested? Local Testing: Verified model loading for minimax-xxx and minimax_m2-xxx model variants on Atlas 800I A2 hardware Tested inference with sample prompts using vLLM's OpenAI-compatible API server Benchmark Validation: Compared throughput and latency metrics against GPU baseline Verified memory usage stays within expected limits for different batch sizes Tested multi-card inference scenarios with tensor parallelism - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@8be6432 --------- Signed-off-by: Feng-xiaosuo <[email protected]>
1 parent 47c2823 commit 7e7dd6d

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

vllm_ascend/quantization/quant_config.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,18 @@ def get_quant_method(self, layer: torch.nn.Module,
117117
prefix: str) -> Optional["QuantizeMethodBase"]:
118118
vllm_config = get_current_vllm_config()
119119
model_type = vllm_config.model_config.hf_text_config.model_type
120+
121+
if model_type in ["minimax", "minimax_m2"]:
122+
prefix = prefix.replace("mlp", "block_sparse_moe")
123+
124+
#To adapt to minimax, modify the prefix of the model layer name
125+
parts = prefix.split('.')
126+
if "experts" in parts and len(parts) > 2:
127+
exp_idx = parts.index("experts")
128+
if exp_idx + 1 < len(parts) and parts[exp_idx + 1].isdigit():
129+
parts = parts[:exp_idx + 1]
130+
prefix = ".".join(parts)
131+
120132
if model_type in packed_modules_model_mapping:
121133
self.packed_modules_mapping = packed_modules_model_mapping[
122134
model_type]
@@ -312,6 +324,14 @@ def get_scaled_act_names(self) -> List[str]:
312324
["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
313325
"fused_qkv_a_proj": ["q_a_proj", "kv_a_proj_with_mqa"]
314326
},
327+
"minimax_m2": {
328+
"qkv_proj": [
329+
"q_proj",
330+
"k_proj",
331+
"v_proj",
332+
],
333+
"experts": ["experts.0.w1", "experts.0.w2", "experts.0.w3"]
334+
}
315335
}
316336

317337

0 commit comments

Comments
 (0)