You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is kinda strange: I can load or run 4 bit or original models on either CPU or GPU. Things work find on the GPU (4090) and I have coda installed etc. however, when I run on the cpu, with this command line,
python server.py --model llama-65b-4bit-128g --wbits 4 --groupsize 128 --cpu
The model loads, and I can attempt to run it. However I get no output (just spits back the prompt) and instead get this:
RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "C:\Users\XXX\miniconda3\envs\textgen\lib\site-packages\torch\include\c10/cuda/impl/CUDAGuardImpl.h":25, please report a bug to PyTorch.
I get output as expected when doing the same thing without the --cpu (for models that fit onto the VRAM).
I have 100G RAM and am not running out of memory, and the process does not die (it just doesn't generate any text.)
Anyone have a clue? Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
This is kinda strange: I can load or run 4 bit or original models on either CPU or GPU. Things work find on the GPU (4090) and I have coda installed etc. however, when I run on the cpu, with this command line,
python server.py --model llama-65b-4bit-128g --wbits 4 --groupsize 128 --cpu
The model loads, and I can attempt to run it. However I get no output (just spits back the prompt) and instead get this:
RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "C:\Users\XXX\miniconda3\envs\textgen\lib\site-packages\torch\include\c10/cuda/impl/CUDAGuardImpl.h":25, please report a bug to PyTorch.
I get output as expected when doing the same thing without the --cpu (for models that fit onto the VRAM).
I have 100G RAM and am not running out of memory, and the process does not die (it just doesn't generate any text.)
Anyone have a clue? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions