Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume attachment limits for p4d.24xlarge are too low? #2301

Open
j-vizcaino opened this issue Jan 22, 2025 · 4 comments
Open

Volume attachment limits for p4d.24xlarge are too low? #2301

j-vizcaino opened this issue Jan 22, 2025 · 4 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@j-vizcaino
Copy link

j-vizcaino commented Jan 22, 2025

/kind bug

What happened?

csinode for p4d.24xlarge reports 6 allocatable EBS volumes but can support more.

According to the AWS docs these instance types should support up to 11 EBS volumes.
As our p4d.24xlarge instances include 4 EFA/ENI devices, this brings down the number to 7. Taking into account the root EBS volume, this brings us to 6.

BUT, those instances, with EFA, support at least 8 EBS (+1 for root) volumes (see below)

How to reproduce it (as minimally and precisely as possible)?

  • create a pd4.24xlarge instance, with EFA enabled
  • describe the associated csinode resource (or look for the ebs-csi-node pod log line): the allocatable volume count is 6
  • attach 6 EBS volumes, using pods
  • create a 7th pod: pod stays in pending due to insufficient allocatable EBS capacity (expected behaviour)
  • update the ebs-csi-node daemonset and force the number of allocatable EBS volumes to 11, adding --volume-attachment-limits=11
  • watch the 7th pod start, with an additional EBS volume
  • (bonus) adding a 8th pod works as well

Anything else we need to know?:

It's unclear if the issue is related to how the available EBS volumes computation is performed in the ebs-csi code, or if it's an AWS issue, with the EC2 metadata endpoint not reporting numbers correctly, but it's clear that those instances support more EBS attachments than what the driver reports.

Environment

  • Kubernetes version (use kubectl version): v1.29.12-eks-2d5f260
  • Driver version: v1.38.1-eksbuild.2 (EKS addon)
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 22, 2025
@j-vizcaino
Copy link
Author

Output of lspci on a p4d.24xlarge node with both EFA and 9 EBS (including root volume) attachments (in case this helps)

# lspci -tv
-+-[0000:a0]-+-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)
 |           +-1b.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)
 |           +-1c.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1d.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1e.0  Amazon.com, Inc. NVMe SSD Controller
 |           \-1f.0  Amazon.com, Inc. NVMe SSD Controller
 +-[0000:90]-+-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)
 |           +-1b.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)
 |           +-1c.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1d.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1e.0  Amazon.com, Inc. NVMe SSD Controller
 |           \-1f.0  Amazon.com, Inc. NVMe SSD Controller
 +-[0000:80]-+-1a.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 |           +-1b.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 |           +-1c.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 |           +-1d.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 |           +-1e.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 |           \-1f.0  NVIDIA Corporation GA100 [A100 NVSwitch]
 +-[0000:20]-+-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)
 |           +-1b.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)
 |           +-1c.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1d.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1e.0  Amazon.com, Inc. NVMe SSD Controller
 |           \-1f.0  Amazon.com, Inc. NVMe SSD Controller
 +-[0000:10]-+-00.0  Amazon.com, Inc. Elastic Network Adapter (ENA)
 |           +-02.0  Amazon.com, Inc. Elastic Network Adapter (ENA)
 |           +-1b.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)
 |           +-1c.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1d.0  NVIDIA Corporation GA100 [A100 SXM4 40GB]
 |           +-1e.0  Amazon.com, Inc. NVMe SSD Controller
 |           \-1f.0  Amazon.com, Inc. NVMe SSD Controller
 \-[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
             +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
             +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
             +-03.0  Amazon.com, Inc. Device 1111
             +-04.0  Amazon.com, Inc. NVMe EBS Controller
             +-17.0  Amazon.com, Inc. NVMe EBS Controller
             +-18.0  Amazon.com, Inc. NVMe EBS Controller
             +-19.0  Amazon.com, Inc. NVMe EBS Controller
             +-1a.0  Amazon.com, Inc. NVMe EBS Controller
             +-1c.0  Amazon.com, Inc. NVMe EBS Controller
             +-1d.0  Amazon.com, Inc. NVMe EBS Controller
             +-1e.0  Amazon.com, Inc. NVMe EBS Controller
             \-1f.0  Amazon.com, Inc. NVMe EBS Controller

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Jan 22, 2025

Hi @j-vizcaino, thank you for opening this issue and providing great reproduction steps!

Let me look into this. We will prioritize a fix in the driver or correct the docs.

In the meantime, you can rely on our Additional Node DaemonSets feature to automate overriding the volume attachment limit for p4d.24xlarge nodes.


Pasting the relevant AWS docs wording below for posterity:

For accelerated computing instances other than VT1 instances, each accelerator counts as an attachment. For example, p4d.24xlarge instances have a shared volume limit of 28, 8 GPUs, and 8 NVMe instance store volumes. This means that you can attach up to 11 EBS volumes (28 volumes - 1 network interface - 8 GPUs - 8 NVMe instance store volumes).

@AndrewSirenko
Copy link
Contributor

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 22, 2025
@torredil
Copy link
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

4 participants