Skip to content

whisper-cublas-12.4.0-bin-x64 : CUDA error with Geforce RTX 2070 Super #3525

@Sw1ss4lps

Description

@Sw1ss4lps

Hello, i am trying to use whisper-cli with a Geforce RTX 2070 Super.
The version without GPU work perfectly, but i've then tried both whisper-cublas-11.8 and 12.4 and i can't use my GPU.

The application recognize the device but at the moment it should start, i get the error current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
cudaStreamSynchronize(cuda_ctx->stream()) followed by \whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error

I've tried to install various versions of the NVidia driver, from 576.52 to the latest, i always get this error.
I've also tried to manually installa CUDA Toolkit 12.4 and 13.0, no change

The executable was downloaded from github releases.

Here's the full log when i run whisper-bench.exe 👍

c:\whisper-cublas-12.4.0-bin-x64\Release>whisper-bench.exe -m LargeV3Turbo.bin -nfa
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070 SUPER, compute capability 7.5, VMM: yes

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CUDA : ARCHS = 520 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | OPENMP = 1 | REPACK = 1 |
whisper_init_from_file_with_params_no_state: loading model from 'LargeV3Turbo.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:        CUDA0 total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   37.69 MB
whisper_init_state: compute buffer (encode) =  212.31 MB
whisper_init_state: compute buffer (cross)  =    9.27 MB
whisper_init_state: compute buffer (decode) =  100.04 MB
CUDA error: an illegal instruction was encountered
  current device: 0, in function ggml_backend_cuda_synchronize at D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2630
  cudaStreamSynchronize(cuda_ctx->stream())
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error

c:\whisper-cublas-12.4.0-bin-x64\Release>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions