[BUG] Mistral-Small-24B-Instruct-2501 - Tensor Parallel outputs garbled text. #728
Open
3 tasks done
Labels
bug
Something isn't working
OS
Linux
GPU Library
CUDA 12.x
Python version
3.10
Pytorch version
2.5.1
Model
No response
Describe the bug
Running inference using Tensor Parallel will output garbled text.
Tried with two different quanted models:
matatonic/Mistral-Small-24B-Instruct-2501-6.5bpw-h8-exl2
MikeRoz/mistralai_Mistral-Small-24B-Instruct-2501-8.0bpw-h8-exl2
Reproduction steps
Example
examples/inference.py:
examples/inference_tp.py:
Expected behavior
Tensor Parallel should be working like the other models~
While this model is sufficiently small, 8bpw 8h with full context doesn't fit in 24GB VRAM
Logs
Installed packages
pip list
Additional context
Cheers! :)
Acknowledgements
The text was updated successfully, but these errors were encountered: