Add docker buildx bake configuration #31477

amrmahdi · 2025-12-29T06:42:40Z

Purpose

Add docker/docker-bake.hcl as the base build configuration for docker buildx bake. This is part of the effort to bake Docker build cache into CPU AMIs for faster CI builds. See vllm-project/ci-infra#256

Context:
CI builds need intermediate stages cached, not just base images. To make this work, we need the exact same build configuration to run in both:

Regular CI image builds
AMI cache warm-up during AMI creation

Why docker buildx bake:
Docker buildx bake provides a declarative, composable build configuration using HCL. CI extends this base config with a ci.hcl overlay that adds caching, registry settings, and commit-based cache keys. Both contexts use the same build definition, ensuring consistency and cache key parity for maximum layer reuse.

This file provides:

Build variables (CUDA arch lists, parallelism settings)
Common build args shared across targets
OCI labels and annotations for image metadata (most importantly the git commit SHA)
Base targets (test, openai) that CI extends

Usage example:
docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci

CI will download ci.hcl from ci-infra and combine it with this file.

Test Result

After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4× faster), more explanation the breakdown can be found in the ci-infra PR linked

gemini-code-assist

Code Review

This pull request introduces a docker-bake.hcl file to standardize Docker builds, which is a great step towards faster and more consistent CI. The configuration looks solid, but I've found a few issues that should be addressed. There's a discrepancy in the usage comments, a potential improvement for commit tracking, and a critical error in the build arguments being passed to the Dockerfile. My review includes suggestions to fix these points to ensure the bake configuration works as intended and is easy to use.

docker/docker-bake.hcl

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

docker/docker-bake.hcl

Reduced CI Docker build time from ~17 min → ~7 min (~2.4× faster) through cache optimization, BuildKit tuning, and network/EBS configuration. ## Problem Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | **Total** | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. ## Solution ### 1. Warm Local BuildKit Cache - Run buildkitd as persistent systemd service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Layers remain on disk between builds → skip download + extraction entirely - **Saves: ~400s per cold-cache build** ### 2. Docker Buildx Bake We need the exact same build to run in CI and during AMI cache warming, otherwise cache keys won't match. Docker buildx bake provides declarative HCL configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with a ci.hcl overlay for registry settings and commit-based cache keys. ### 3. Commit-Based Cache Keys - Cache FROM and TO specific git commit SHAs - AMI warm-up caches from latest postmerge commit - PR builds also cache from merge-base commit (high hit rate for PRs) Note: Some leaf layer invalidation still occurs due to .git directory in build context - to be addressed separately. ### 4. EBS Throughput Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O during cache import/export (ada24ca). - Increased to 1000 MB/s + 16000 IOPS - 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s) - **Saves: ~45s on cache export (162.7s → 118.0s)** ### 5. Other Optimizations - Standalone buildkitd with max-parallelism=32 (for 64 vCPU r6in.16xlarge) - Network tuning: BBR congestion control, 16MB TCP buffers - Docker daemon: max-concurrent-downloads/uploads=16 Signed-off-by: Amr Mahdi <[email protected]>

## Problem Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | **Total** | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. ## Solution ### 1. Warm Local BuildKit Cache - Run buildkitd as persistent systemd service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Layers remain on disk between builds → skip download + extraction entirely - **Saves: ~400s per cold-cache build** ### 2. Docker Buildx Bake We need the exact same build to run in CI and during AMI cache warming, otherwise cache keys won't match. Docker buildx bake provides declarative HCL configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with a ci.hcl overlay for registry settings and commit-based cache keys. ### 3. Commit-Based Cache Keys - Cache FROM and TO specific git commit SHAs - AMI warm-up caches from latest postmerge commit - PR builds also cache from merge-base commit (high hit rate for PRs) Note: Some leaf layer invalidation still occurs due to .git directory in build context - to be addressed separately. ### 4. EBS Throughput Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O during cache import/export (ada24ca). - Increased to 1000 MB/s + 16000 IOPS - 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s) - **Saves: ~45s on cache export (162.7s → 118.0s)** ### 5. Other Optimizations - Standalone buildkitd with max-parallelism=32 (for 64 vCPU r6in.16xlarge) - Network tuning: BBR congestion control, 16MB TCP buffers - Docker daemon: max-concurrent-downloads/uploads=16 Signed-off-by: Amr Mahdi <[email protected]>

Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | **Total** | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. - Run buildkitd as persistent systemd service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Layers remain on disk between builds → skip download + extraction entirely - **Saves: ~400s per cold-cache build** We need the exact same build to run in CI and during AMI cache warming, otherwise cache keys won't match. Docker buildx bake provides declarative HCL configuration - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with a ci.hcl overlay for registry settings and commit-based cache keys. - Cache FROM and TO specific git commit SHAs - AMI warm-up caches from latest postmerge commit - PR builds also cache from merge-base commit (high hit rate for PRs) Note: Some leaf layer invalidation still occurs due to .git directory in build context - to be addressed separately. Default gp3 throughput (125 MB/s) bottlenecked BuildKit's sequential blob I/O during cache import/export (ada24ca). - Increased to 1000 MB/s + 16000 IOPS - 2000 MB/s showed diminishing returns (bottleneck shifts to ECR ~60-100 MB/s) - **Saves: ~45s on cache export (162.7s → 118.0s)** - Increased BuildKit/Docker parallelism - Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html Signed-off-by: Amr Mahdi <[email protected]>

## Context Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | Total | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. ## Solution ### 1. Warm Local BuildKit Cache - Run buildkitd as persistent service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Layers remain on disk between builds → skip download + extraction entirely - Saves: ~400s per cold-cache build ### 2. Docker Buildx Bake - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay - Same build config runs in CI and AMI warm-up → cache key parity ### 3. Cache-From Configuration - Added merge-base commit to cache-from sources - Ensures cache hits for unchanged intermediate stages ### 4. EBS Throughput - Increased gp3 from 125 MB/s → 1000 MB/s - Speeds up cache import/export and registry push, cutting at least 1 minute ### 5. Network & Concurrency Tuning - Increased BuildKit/Docker parallelism - Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html Signed-off-by: Amr Mahdi <[email protected]>

Depends on vllm-project/vllm#31477 ## Context Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | Total | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. ## Solution ### 1. Warm Local BuildKit Cache - Run buildkitd as persistent service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day - Layers remain on disk between builds → skip download + extraction entirely - Saves: ~400s per cold-cache build ### 2. Docker Buildx Bake - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay - Same build config runs in CI and AMI warm-up → cache key parity ### 3. Cache-From Configuration - Added merge-base commit to cache-from sources - Ensures cache hits for unchanged intermediate stages ### 4. EBS Throughput - Increased gp3 from 125 MB/s → 1000 MB/s - Speeds up cache import/export and registry push, cutting at least 1 minute ### 5. Network & Concurrency Tuning - Increased BuildKit/Docker parallelism - Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html ### 6. Debug Script - Added debug-machine-config.sh to print instance info, network settings, and BuildKit config - Helps verify AMI optimizations are active in CI builds Signed-off-by: Amr Mahdi <[email protected]>

Add docker/docker-bake.hcl as the base build configuration for docker buildx bake. This is part of the effort to bake Docker build cache into CPU AMIs for faster CI builds. See vllm-project/ci-infra#256 Context: CI builds need intermediate stages cached, not just base images. To make this work, we need the exact same build configuration to run in both: 1. Regular CI image builds 2. AMI cache warm-up during AMI creation Why docker buildx bake: Docker buildx bake provides a declarative, composable build configuration using HCL. CI extends this base config with a ci.hcl overlay that adds caching, registry settings, and commit-based cache keys. Both contexts use the same build definition, ensuring consistency and cache key parity for maximum layer reuse. This file provides: - Build variables (CUDA arch lists, parallelism settings) - Common build args shared across targets - OCI labels and annotations for image metadata (most importantly the git commit SHA) - Base targets (test, openai) that CI extends Usage example: docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci CI will download ci.hcl from ci-infra and combine it with this file. After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4x faster), more explanation the breakdown can be found in the ci-infra PR linked Signed-off-by: Amr Mahdi <[email protected]>

Introduces docker/versions.json as a machine-readable version manifest for all pinned dependencies used in Docker builds. This complements the docker buildx bake work in PR vllm-project#31477: - versions.json provides the version data (what versions to use) - docker-bake.hcl provides build configuration (how to build) Benefits: - Single source of truth for all pinned versions - Machine-readable format for CI and external tooling (jq-parseable) - Cleaner diffs for version bumps - Easy release comparison: git diff v0.13..v0.14 -- docker/versions.json Files: - docker/versions.json: Version manifest with CUDA, Python, FlashInfer, bitsandbytes, torch arch list, and extension commit refs - docker/Dockerfile: Added comments pointing to versions.json Signed-off-by: Mritunjay Sharma <[email protected]>

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

docker/docker-bake.hcl Outdated Show resolved Hide resolved

docker/docker-bake.hcl Outdated Show resolved Hide resolved

docker/docker-bake.hcl Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 29, 2025

View reviewed changes

docker/docker-bake.hcl Outdated Show resolved Hide resolved

amrmahdi mentioned this pull request Dec 29, 2025

Warm BuildKit cache in AMIs for 2.4× faster CI builds vllm-project/ci-infra#256

Open

amrmahdi force-pushed the amrmahdi/docker-bake branch from 4a5c62a to 251a50b Compare December 29, 2025 08:07

amrmahdi force-pushed the amrmahdi/docker-bake branch from 0b5a0ce to 0bfd748 Compare December 29, 2025 08:09

Merge branch 'main' into amrmahdi/docker-bake

5185180

amrmahdi mentioned this pull request Dec 29, 2025

[CI/Build][Docker] Add centralized version manifest for Docker builds #31492

Open

3 tasks

khluu approved these changes Dec 30, 2025

View reviewed changes

khluu enabled auto-merge (squash) December 31, 2025 00:35

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 31, 2025

khluu merged commit e1ee11b into vllm-project:main Dec 31, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add docker buildx bake configuration #31477

Add docker buildx bake configuration #31477

amrmahdi commented Dec 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add docker buildx bake configuration #31477

Add docker buildx bake configuration #31477

Conversation

amrmahdi commented Dec 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amrmahdi commented Dec 29, 2025 •

edited by github-actions bot

Loading