Skip to content

Commit e5cf6e5

Browse files
Add the min, max to embedding 4bit
#1506 Please refer it for more context information. Case: Failed to export the embedding 4bit model with the following command: python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte Solution: Add the min, max to embedding 4bit. Test: The 4bit,8bit of embedding are exported successfully by the commands: Change torchchat/quant_config/mobile.json to: { "embedding": {"bitwidth": 8, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 256} } python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me8.pte { "embedding": {"bitwidth": 4, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 256} } python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte Signed-off-by: jijie <[email protected]> Co-authored-by: jijie <[email protected]>
1 parent 4251a54 commit e5cf6e5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: torchchat/utils/quantize.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -747,7 +747,7 @@ def et_forward(self, indices: torch.Tensor) -> torch.Tensor:
747747
)
748748
else:
749749
return torch.ops.quantized_decomposed.embedding_4bit.dtype(
750-
self.weight, self.scales, None, 0, 0, indices, dtype=self.dtype
750+
self.weight, self.scales, None, -8, 7, indices, dtype=self.dtype
751751
)
752752

753753
@torch.no_grad()

0 commit comments

Comments
 (0)