Skip to content

[RFC] Support FP4 (E2M1) activation dispatch in MegaMoE #338

@wuudiz

Description

@wuudiz

Currently fp8_fp4_mega_moe accepts FP8 activations only. The
use_fp8_dispatch
parameter is plumbed through get_symm_buffer_for_mega_moe / SymmBuffer
but has
no effect -- buffer sizing and the kernel always assume FP8.
Switching to FP4 activations (E2M1) would halve the symmetric buffer input
footprint
and potentially enable the MXF4 mainloop (K=64 dense vs K=32 padded in
mxf8f6f4).
I ran an end-to-end proof-of-concept (w4a4 vs w4a8) using an external
mega_moe_pre_dispatch kernel for E2M1 packing + FP4-aware buffer allocation,
with the rest of the pipeline unchanged. GSM8K (1319 samples, greedy) results:
• DeepSeek-V4-Flash: w4a8 93.86% → w4a4 94.39% (delta +0.53%)
• DeepSeek-V4-Pro: w4a8 91.81% → w4a4 91.13% (delta −0.68%)
Both within noise -- no meaningful accuracy degradation from FP4 activation
packing.
Would it be feasible to wire use_fp8_dispatch=False through to the buffer
layout
and kernel?
Happy to help test or contribute if there's a preferred direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions