[FC] Add dequantize action to FC layer #2936

EunjuYang · 2025-02-11T11:47:05Z

This commit is temporary one to realize a fc layer with QINT weight.
This should be fixed in the way to move the dequantize operations under the tensor op.
dequantization overhead :

Without SWAP		With SWAP (LA=2)
FP32-FP32	QINT8-FP32	FP32-FP32	QINT8-FP32
0.162787	0.166505	0.0487955	0.127012

baek2sm · 2025-02-12T05:58:35Z

nntrainer/layers/fc_layer.cpp

+
+  ///@todo this quantizaer should be moved to tensor, not layer!
+  switch (context.getWeightDataType()) {
+  case ml::train::TensorDim::DataType::QINT4:


how about adding 'not supported yet' message till implementation?

It intends to make a quantizerfor all QINT4, QINT8, QINT16 cases. AFIK, QINT4, 8, 16 all work with a quantizer. Please let me know if I missed something!

baek2sm

LGTM

EunjuYang · 2025-02-12T09:16:43Z

FYI)

I tested the case where QINT8-FP32 is used (weight dtype-tensor dtype).
Please refer to the result below:

dequantization overhead (tested with EunjuYang@cfe0f2f):

$  ./nntrainer_simplefc 100 false 0 FP32 FP32
$  ./nntrainer_simplefc 100 false 0 QINT8 FP32
$  ./nntrainer_simplefc 100 true 2 FP32 FP32
$  ./nntrainer_simplefc 100 true 2 QINT8 FP32

Without SWAP		With SWAP (LA=2)
FP32-FP32	QINT8-FP32	FP32-FP32	QINT8-FP32
0.162787	0.166505	0.0487955	0.127012

bin file size is 400 MB vs. 100 MB.
Result was not tested (no suitable initialization method for scale is implemented yet)

- This commit is temporary one to realize a fc layer with QINT weight. - This should be fixed in the way to move the dequantize operations under the tensor op. Signed-off-by: Eunju Yang <[email protected]>

skykongkong8 · 2025-02-17T01:28:07Z

nntrainer/layers/fc_layer.cpp

+  ///@todo This dequantization action should be moved to tensor.dot()
+  if (quantizer != nullptr) {
+    Tensor weight_ = quantizer->dequantize(weight, input_.getDataType());


Just curious, I wonder how should it be expected to work at Tensor::dot().
Maybe something like:

Tensor::dot() { ... int8_t data = getData(); int8_t input_data = input; Tensor qinput = quantizer->dequantize(...); GEMM(data, qinput.getData(), output); ... }

?

github-actions bot added the Need Review label Feb 11, 2025

baek2sm reviewed Feb 12, 2025

View reviewed changes

baek2sm approved these changes Feb 12, 2025

View reviewed changes

[FC] Add dequantize action to FC layer

133ea02

- This commit is temporary one to realize a fc layer with QINT weight. - This should be fixed in the way to move the dequantize operations under the tensor op. Signed-off-by: Eunju Yang <[email protected]>

EunjuYang force-pushed the test/fsu_qint8_fp32 branch from e2c2423 to 133ea02 Compare February 12, 2025 09:20

skykongkong8 reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FC] Add dequantize action to FC layer #2936

[FC] Add dequantize action to FC layer #2936

EunjuYang commented Feb 11, 2025 •

edited

Loading

baek2sm Feb 12, 2025

EunjuYang Feb 12, 2025

baek2sm left a comment

EunjuYang commented Feb 12, 2025 •

edited

Loading

skykongkong8 Feb 17, 2025 •

edited

Loading

[FC] Add dequantize action to FC layer #2936

Are you sure you want to change the base?

[FC] Add dequantize action to FC layer #2936

Conversation

EunjuYang commented Feb 11, 2025 • edited Loading

baek2sm Feb 12, 2025

Choose a reason for hiding this comment

EunjuYang Feb 12, 2025

Choose a reason for hiding this comment

baek2sm left a comment

Choose a reason for hiding this comment

EunjuYang commented Feb 12, 2025 • edited Loading

skykongkong8 Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

EunjuYang commented Feb 11, 2025 •

edited

Loading

EunjuYang commented Feb 12, 2025 •

edited

Loading

skykongkong8 Feb 17, 2025 •

edited

Loading