-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Description
Hello, i am trying to use whisper-cli with a Geforce RTX 2070 Super.
The version without GPU work perfectly, but i've then tried both whisper-cublas-11.8 and 12.4 and i can't use my GPU.
The application recognize the device but at the moment it should start, i get the error current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
cudaStreamSynchronize(cuda_ctx->stream()) followed by \whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error
I've tried to install various versions of the NVidia driver, from 576.52 to the latest, i always get this error.
I've also tried to manually installa CUDA Toolkit 12.4 and 13.0, no change
The executable was downloaded from github releases.
Here's the full log when i run whisper-bench.exe 👍
c:\whisper-cublas-12.4.0-bin-x64\Release>whisper-bench.exe -m LargeV3Turbo.bin -nfa
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070 SUPER, compute capability 7.5, VMM: yes
system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CUDA : ARCHS = 520 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | REPACK = 1 |
whisper_init_from_file_with_params_no_state: loading model from 'LargeV3Turbo.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: CUDA0 total size = 1623.92 MB
whisper_model_load: model size = 1623.92 MB
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size = 10.49 MB
whisper_init_state: kv cross size = 31.46 MB
whisper_init_state: kv pad size = 7.86 MB
whisper_init_state: compute buffer (conv) = 37.69 MB
whisper_init_state: compute buffer (encode) = 212.31 MB
whisper_init_state: compute buffer (cross) = 9.27 MB
whisper_init_state: compute buffer (decode) = 100.04 MB
CUDA error: an illegal instruction was encountered
current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
cudaStreamSynchronize(cuda_ctx->stream())
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error
c:\whisper-cublas-12.4.0-bin-x64\Release>