Add the min, max to embedding 4bit

TheBetterSolution · codereba · web-flow · commit e5cf6e57d44f · 2025-02-28T09:31:19.000-08:00
#1506 Please refer it for more context information. Case: Failed to export the embedding 4bit model with the following command: python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte Solution: Add the min, max to embedding 4bit. Test: The 4bit,8bit of embedding are exported successfully by the commands: Change torchchat/quant_config/mobile.json to: { "embedding": {"bitwidth": 8, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 256} } python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me8.pte { "embedding": {"bitwidth": 4, "groupsize" : 32}, "linear:a8w4dq": {"groupsize" : 256} } python torchchat.py export stories110m --quantize torchchat/quant_config/mobile.json --output-pte-path stories110me4.pte Signed-off-by: jijie <shi.jijie@hotmail.com> Co-authored-by: jijie <shi.jijie@hotmail.com>
diff --git a/torchchat/utils/quantize.py b/torchchat/utils/quantize.py
@@ -747,7 +747,7 @@ def et_forward(self, indices: torch.Tensor) -> torch.Tensor:
             )
         else:
             return torch.ops.quantized_decomposed.embedding_4bit.dtype(
-                self.weight, self.scales, None, 0, 0, indices, dtype=self.dtype
+                self.weight, self.scales, None, -8, 7, indices, dtype=self.dtype
             )
 
     @torch.no_grad()

Original file line number	Diff line number	Diff line change
`@@ -747,7 +747,7 @@ def et_forward(self, indices: torch.Tensor) -> torch.Tensor:`
`747`	`747`	`)`
`748`	`748`	`else:`
`749`	`749`	`return torch.ops.quantized_decomposed.embedding_4bit.dtype(`
`750`		`- self.weight, self.scales, None, 0, 0, indices, dtype=self.dtype`
	`750`	`+ self.weight, self.scales, None, -8, 7, indices, dtype=self.dtype`
`751`	`751`	`)`
`752`	`752`
`753`	`753`	`@torch.no_grad()`