Skip to content

initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown #1312

@khteh

Description

@khteh

Ubuntu 25.04
Dockerfile:

# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
RUN apt install -y software-properties-common apt-transport-https curl sudo gnupg pipenv unzip dnsutils wget python3 python3-pip
RUN curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
RUN curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
RUN apt update -y
ENV NVIDIA_CONTAINER_TOOLKIT_VERSION 1.17.8-1
RUN apt install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
RUN nvidia-ctk runtime configure --runtime=containerd

k8s:

clientVersion:
  buildDate: "2024-10-16T15:15:29Z"
  compiler: gc
  gitCommit: cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a
  gitTreeState: clean
  gitVersion: v1.27.16
  goVersion: go1.22.5
  major: "1"
  minor: "27"
  platform: linux/amd64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2024-10-16T15:16:32Z"
  compiler: gc
  gitCommit: cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a
  gitTreeState: clean
  gitVersion: v1.27.16
  goVersion: go1.22.5
  major: "1"
  minor: "27"
  platform: linux/amd64

Error:

  Warning  Failed     5s (x2 over 11s)  kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli.real: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
  Warning  BackOff  2s (x4 over 5s)  kubelet  Back-off restarting failed container ollama in pod ollama-0_default(ada1f8c4-efb0-4eb4-9ae1-dd44e376fe78)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions