Passing GPU to a container totally broken rear upgrading to Debian 13

Hi! Good morning.

I just preceeded to upgrade my Debian VMs to version 13 and i realized that on my test environment the containers i'm using with my nvidia GPU are broken.

```
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown
````

That's what i see on the dmesg:

````
[ 3385.030120] nvidia-uvm: Loaded the UVM driver, major device number 239.
[ 3385.084544] __vm_enough_memory: pid: 5206, comm: nvc:[driver], bytes: 8589934592 not enough memory for the allocation
[ 3385.084558] __vm_enough_memory: pid: 5206, comm: nvc:[driver], bytes: 8589946880 not enough memory for the allocation
[ 3385.084569] __vm_enough_memory: pid: 5206, comm: nvc:[driver], bytes: 8590069760 not enough memory for the allocation
[ 3385.084576] show_signal_msg: 101 callbacks suppressed
[ 3385.084578] nvc:[driver][5206]: segfault at 28 ip 00007f218a762225 sp 00007ffd531ad9f0 error 4 in libc.so.6[155225,7f218a635000+165000] likely on CPU 1 (core 1, socket 0)
[ 3385.084593] Code: 41 5d c3 66 90 41 54 55 53 8b 2f 48 89 fb e8 22 63 ff ff 39 e8 7e 18 e8 89 fc ff ff 4c 63 e5 48 8b 80 e0 00 00 00 4a 8d 04 e0 <48> 39 18 74 06 5b 5d 41 5c c3 90 48 c7 00 00 00 00 00 81 fd ff 03
[ 3422.810309] __vm_enough_memory: pid: 5453, comm: nvc:[driver], bytes: 8589934592 not enough memory for the allocation
[ 3422.810322] __vm_enough_memory: pid: 5453, comm: nvc:[driver], bytes: 8589946880 not enough memory for the allocation
[ 3422.810332] __vm_enough_memory: pid: 5453, comm: nvc:[driver], bytes: 8590069760 not enough memory for the allocation
[ 3422.810340] nvc:[driver][5453]: segfault at 28 ip 00007f0550538225 sp 00007ffdc4f29700 error 4 in libc.so.6[155225,7f055040b000+165000] likely on CPU 0 (core 0, socket 0)
[ 3422.810405] Code: 41 5d c3 66 90 41 54 55 53 8b 2f 48 89 fb e8 22 63 ff ff 39 e8 7e 18 e8 89 fc ff ff 4c 63 e5 48 8b 80 e0 00 00 00 4a 8d 04 e0 <48> 39 18 74 06 5b 5d 41 5c c3 90 48 c7 00 00 00 00 00 81 fd ff 03
````

````
Mon Sep 29 12:08:06 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03             Driver Version: 535.261.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID P4-2Q                     On  | 00000000:01:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
`````

I've tried to reinstall the toolkit with https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html as suggested here: https://github.com/NVIDIA/nvidia-container-toolkit/issues/1051 but with the same results. Using 1.17.8-1, with a Tesla P4 (see nvidia-smi for more info).

My daemon.json on docker:

`````
{
    "data-root": "/mnt/docker-lib",
    "log-driver": "json-file",
    "log-opts": {
        "max-file": "3",
        "max-size": "30m"
    },
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
`````

Only way to make my containers work again is to rollback to Debian 12 VM backup.

I'm using KVM (PVE).

Please, feel free to ask for any information you need to debug it more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Passing GPU to a container totally broken rear upgrading to Debian 13 #1324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Passing GPU to a container totally broken rear upgrading to Debian 13 #1324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions