Plans for block-wise FP8 quantization during training? #1411

beccohov · 2025-01-15T13:06:55Z

Hi TE team,

I'm interested in whether there are plans to implement block-wise quantization for FP8 training, similar to what's described in papers like "Deepseek V3".

Block quantization could potentially provide better numerical stability and accuracy compared to tensor-wide quantization, especially for outlier values. This could be particularly valuable for large language models where maintaining precision is crucial.

Some specific questions:

Is this feature currently on your roadmap?
If yes, what's the approximate timeline?
If no, are there technical challenges preventing this implementation?

Thank you for your time!

zigzagcai · 2025-01-16T10:28:39Z

I have the same interest with block-wise FP8.

liangzelang · 2025-01-17T06:38:29Z

ME TOO

Monekyzoon · 2025-01-17T08:39:05Z

In a addition, activation use tile-wise(1 x 128) quantization in DeepSeek-V3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for block-wise FP8 quantization during training? #1411

Plans for block-wise FP8 quantization during training? #1411

beccohov commented Jan 15, 2025

zigzagcai commented Jan 16, 2025 •

edited

Loading

liangzelang commented Jan 17, 2025

Monekyzoon commented Jan 17, 2025 •

edited

Loading

Plans for block-wise FP8 quantization during training? #1411

Plans for block-wise FP8 quantization during training? #1411

Comments

beccohov commented Jan 15, 2025

zigzagcai commented Jan 16, 2025 • edited Loading

liangzelang commented Jan 17, 2025

Monekyzoon commented Jan 17, 2025 • edited Loading

zigzagcai commented Jan 16, 2025 •

edited

Loading

Monekyzoon commented Jan 17, 2025 •

edited

Loading