whisper-cublas-12.4.0-bin-x64 : CUDA error with Geforce RTX 2070 Super

Hello, i am trying to use whisper-cli with a Geforce RTX 2070 Super.
The version without GPU work perfectly, but i've then tried both whisper-cublas-11.8 and 12.4 and i can't use my GPU.

The application recognize the device but at the moment it should start, i get the error current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
  cudaStreamSynchronize(cuda_ctx->stream())  followed by \whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error

I've tried to install various versions of the NVidia driver, from 576.52 to the latest, i always get this error.
I've also tried to manually installa CUDA Toolkit 12.4 and 13.0, no change

The executable was downloaded from github releases.


Here's the full log when i run whisper-bench.exe 👍 

```
c:\whisper-cublas-12.4.0-bin-x64\Release>whisper-bench.exe -m LargeV3Turbo.bin -nfa
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070 SUPER, compute capability 7.5, VMM: yes

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CUDA : ARCHS = 520 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | REPACK = 1 |
whisper_init_from_file_with_params_no_state: loading model from 'LargeV3Turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:        CUDA0 total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   37.69 MB
whisper_init_state: compute buffer (encode) =  212.31 MB
whisper_init_state: compute buffer (cross)  =    9.27 MB
whisper_init_state: compute buffer (decode) =  100.04 MB
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
  cudaStreamSynchronize(cuda_ctx->stream())
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error

c:\whisper-cublas-12.4.0-bin-x64\Release>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper-cublas-12.4.0-bin-x64 : CUDA error with Geforce RTX 2070 Super #3525

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

whisper-cublas-12.4.0-bin-x64 : CUDA error with Geforce RTX 2070 Super #3525

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions