Skip to content

Conversation

@amrmahdi
Copy link
Contributor

@amrmahdi amrmahdi commented Dec 29, 2025

Purpose

Add docker/docker-bake.hcl as the base build configuration for docker buildx bake. This is part of the effort to bake Docker build cache into CPU AMIs for faster CI builds. See vllm-project/ci-infra#256

Context:
CI builds need intermediate stages cached, not just base images. To make this work, we need the exact same build configuration to run in both:

  1. Regular CI image builds
  2. AMI cache warm-up during AMI creation

Why docker buildx bake:
Docker buildx bake provides a declarative, composable build configuration using HCL. CI extends this base config with a ci.hcl overlay that adds caching, registry settings, and commit-based cache keys. Both contexts use the same build definition, ensuring consistency and cache key parity for maximum layer reuse.

This file provides:

  • Build variables (CUDA arch lists, parallelism settings)
  • Common build args shared across targets
  • OCI labels and annotations for image metadata (most importantly the git commit SHA)
  • Base targets (test, openai) that CI extends

Usage example:
docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci

CI will download ci.hcl from ci-infra and combine it with this file.

Test Result

After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4× faster), more explanation the breakdown can be found in the ci-infra PR linked

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a docker-bake.hcl file to standardize Docker builds, which is a great step towards faster and more consistent CI. The configuration looks solid, but I've found a few issues that should be addressed. There's a discrepancy in the usage comments, a potential improvement for commit tracking, and a critical error in the build arguments being passed to the Dockerfile. My review includes suggestions to fix these points to ensure the bake configuration works as intended and is easy to use.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
Reduced CI Docker build time from ~17 min → ~7 min (~2.4× faster) through cache
optimization, BuildKit tuning, and network/EBS configuration.

## Problem

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| **Total**    | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that
rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent systemd service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- **Saves: ~400s per cold-cache build**

### 2. Docker Buildx Bake

We need the exact same build to run in CI and during AMI cache warming,
otherwise cache keys won't match. Docker buildx bake provides declarative HCL
configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477)
with a ci.hcl overlay for registry settings and commit-based cache keys.

### 3. Commit-Based Cache Keys

- Cache FROM and TO specific git commit SHAs
- AMI warm-up caches from latest postmerge commit
- PR builds also cache from merge-base commit (high hit rate for PRs)

Note: Some leaf layer invalidation still occurs due to .git directory in
build context - to be addressed separately.

### 4. EBS Throughput

Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O
during cache import/export (ada24ca).

- Increased to 1000 MB/s + 16000 IOPS
- 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s)
- **Saves: ~45s on cache export (162.7s → 118.0s)**

### 5. Other Optimizations

- Standalone buildkitd with max-parallelism=32 (for 64 vCPU r6in.16xlarge)
- Network tuning: BBR congestion control, 16MB TCP buffers
- Docker daemon: max-concurrent-downloads/uploads=16

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
## Problem

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| **Total**    | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that
rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent systemd service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- **Saves: ~400s per cold-cache build**

### 2. Docker Buildx Bake

We need the exact same build to run in CI and during AMI cache warming,
otherwise cache keys won't match. Docker buildx bake provides declarative HCL
configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477)
with a ci.hcl overlay for registry settings and commit-based cache keys.

### 3. Commit-Based Cache Keys

- Cache FROM and TO specific git commit SHAs
- AMI warm-up caches from latest postmerge commit
- PR builds also cache from merge-base commit (high hit rate for PRs)

Note: Some leaf layer invalidation still occurs due to .git directory in
build context - to be addressed separately.

### 4. EBS Throughput

Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O
during cache import/export (ada24ca).

- Increased to 1000 MB/s + 16000 IOPS
- 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s)
- **Saves: ~45s on cache export (162.7s → 118.0s)**

### 5. Other Optimizations

- Standalone buildkitd with max-parallelism=32 (for 64 vCPU r6in.16xlarge)
- Network tuning: BBR congestion control, 16MB TCP buffers
- Docker daemon: max-concurrent-downloads/uploads=16

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| **Total**    | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that
rarely change.

- Run buildkitd as persistent systemd service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- **Saves: ~400s per cold-cache build**

We need the exact same build to run in CI and during AMI cache warming,
otherwise cache keys won't match. Docker buildx bake provides declarative HCL
configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477)
with a ci.hcl overlay for registry settings and commit-based cache keys.

- Cache FROM and TO specific git commit SHAs
- AMI warm-up caches from latest postmerge commit
- PR builds also cache from merge-base commit (high hit rate for PRs)

Note: Some leaf layer invalidation still occurs due to .git directory in
build context - to be addressed separately.

Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O
during cache import/export (ada24ca).

- Increased to 1000 MB/s + 16000 IOPS
- 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s)
- **Saves: ~45s on cache export (162.7s → 118.0s)**

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| **Total**    | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that
rarely change.

- Run buildkitd as persistent systemd service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- **Saves: ~400s per cold-cache build**

We need the exact same build to run in CI and during AMI cache warming,
otherwise cache keys won't match. Docker buildx bake provides declarative HCL
configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477)
with a ci.hcl overlay for registry settings and commit-based cache keys.

- Cache FROM and TO specific git commit SHAs
- AMI warm-up caches from latest postmerge commit
- PR builds also cache from merge-base commit (high hit rate for PRs)

Note: Some leaf layer invalidation still occurs due to .git directory in
build context - to be addressed separately.

Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O
during cache import/export (ada24ca).

- Increased to 1000 MB/s + 16000 IOPS
- 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s)
- **Saves: ~45s on cache export (162.7s → 118.0s)**

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
## Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| Total        | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- Saves: ~400s per cold-cache build

### 2. Docker Buildx Bake

- CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay
- Same build config runs in CI and AMI warm-up → cache key parity

### 3. Cache-From Configuration

- Added merge-base commit to cache-from sources
- Ensures cache hits for unchanged intermediate stages

### 4. EBS Throughput

- Increased gp3 from 125 MB/s → 1000 MB/s
- Speeds up cache import/export and registry push, cutting at least 1 minute

### 5. Network & Concurrency Tuning

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
## Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| Total        | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Layers remain on disk between builds → skip download + extraction entirely
- Saves: ~400s per cold-cache build

### 2. Docker Buildx Bake

- CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay
- Same build config runs in CI and AMI warm-up → cache key parity

### 3. Cache-From Configuration

- Added merge-base commit to cache-from sources
- Ensures cache hits for unchanged intermediate stages

### 4. EBS Throughput

- Increased gp3 from 125 MB/s → 1000 MB/s
- Speeds up cache import/export and registry push, cutting at least 1 minute

### 5. Network & Concurrency Tuning

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

Signed-off-by: Amr Mahdi <[email protected]>
amrmahdi added a commit to vllm-project/ci-infra that referenced this pull request Dec 29, 2025
Depends on vllm-project/vllm#31477

## Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| Total        | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day
- Layers remain on disk between builds → skip download + extraction entirely
- Saves: ~400s per cold-cache build

### 2. Docker Buildx Bake

- CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay
- Same build config runs in CI and AMI warm-up → cache key parity

### 3. Cache-From Configuration

- Added merge-base commit to cache-from sources
- Ensures cache hits for unchanged intermediate stages

### 4. EBS Throughput

- Increased gp3 from 125 MB/s → 1000 MB/s
- Speeds up cache import/export and registry push, cutting at least 1 minute

### 5. Network & Concurrency Tuning

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

### 6. Debug Script

- Added debug-machine-config.sh to print instance info, network settings, and BuildKit config
- Helps verify AMI optimizations are active in CI builds

Signed-off-by: Amr Mahdi <[email protected]>
@amrmahdi amrmahdi force-pushed the amrmahdi/docker-bake branch from 4a5c62a to 251a50b Compare December 29, 2025 08:07
Add docker/docker-bake.hcl as the base build configuration for docker
buildx bake. This is part of the effort to bake Docker build cache into
CPU AMIs for faster CI builds. See vllm-project/ci-infra#256

Context:
CI builds need intermediate stages cached, not just base images. To make
this work, we need the exact same build configuration to run in both:
1. Regular CI image builds
2. AMI cache warm-up during AMI creation

Why docker buildx bake:
Docker buildx bake provides a declarative, composable build configuration
using HCL. CI extends this base config with a ci.hcl overlay that adds
caching, registry settings, and commit-based cache keys. Both contexts
use the same build definition, ensuring consistency and cache key parity
for maximum layer reuse.

This file provides:
- Build variables (CUDA arch lists, parallelism settings)
- Common build args shared across targets
- OCI labels and annotations for image metadata (most importantly the git
  commit SHA)
- Base targets (test, openai) that CI extends

Usage example:
docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci

CI will download ci.hcl from ci-infra and combine it with this file.

After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4x faster), more explanation the breakdown can be found in the ci-infra PR linked

Signed-off-by: Amr Mahdi <[email protected]>
@amrmahdi amrmahdi force-pushed the amrmahdi/docker-bake branch from 0b5a0ce to 0bfd748 Compare December 29, 2025 08:09
mritunjaysharma394 added a commit to mritunjaysharma394/vllm that referenced this pull request Dec 30, 2025
Introduces docker/versions.json as a machine-readable version manifest
for all pinned dependencies used in Docker builds.

This complements the docker buildx bake work in PR vllm-project#31477:
- versions.json provides the version data (what versions to use)
- docker-bake.hcl provides build configuration (how to build)

Benefits:
- Single source of truth for all pinned versions
- Machine-readable format for CI and external tooling (jq-parseable)
- Cleaner diffs for version bumps
- Easy release comparison: git diff v0.13..v0.14 -- docker/versions.json

Files:
- docker/versions.json: Version manifest with CUDA, Python, FlashInfer,
  bitsandbytes, torch arch list, and extension commit refs
- docker/Dockerfile: Added comments pointing to versions.json

Signed-off-by: Mritunjay Sharma <[email protected]>
mritunjaysharma394 added a commit to mritunjaysharma394/vllm that referenced this pull request Dec 30, 2025
Introduces docker/versions.json as a machine-readable version manifest
for all pinned dependencies used in Docker builds.

This complements the docker buildx bake work in PR vllm-project#31477:
- versions.json provides the version data (what versions to use)
- docker-bake.hcl provides build configuration (how to build)

Benefits:
- Single source of truth for all pinned versions
- Machine-readable format for CI and external tooling (jq-parseable)
- Cleaner diffs for version bumps
- Easy release comparison: git diff v0.13..v0.14 -- docker/versions.json

Files:
- docker/versions.json: Version manifest with CUDA, Python, FlashInfer,
  bitsandbytes, torch arch list, and extension commit refs
- docker/Dockerfile: Added comments pointing to versions.json

Signed-off-by: Mritunjay Sharma <[email protected]>
@khluu khluu enabled auto-merge (squash) December 31, 2025 00:35
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 31, 2025
@khluu khluu merged commit e1ee11b into vllm-project:main Dec 31, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants