Warm BuildKit cache in AMIs for 2.4× faster CI builds #256

amrmahdi · 2025-12-29T07:56:39Z

Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

Layer	Size	Download	Extract	Total
Extensions	4.38 GB	75s	14s	89s
CUDA deps	3.21 GB	51s	32s	83s
PyTorch	5.15 GB	97s	-	97s
Build tools	2.30 GB	38s	18s	56s
Other layers	~5 GB	~70s	~6s	76s
Total	~18 GB	~330s	~70s	~400s

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

Solution

1. Warm Local BuildKit Cache

Run buildkitd as persistent service with remote driver
Bake cache into custom AMIs, rebuilt daily from latest postmerge
Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day
Layers remain on disk between builds → skip download + extraction entirely
Saves: ~400s per cold-cache build

2. Docker Buildx Bake

CI extends vLLM's docker-bake.hcl (Add docker buildx bake configuration vllm#31477) with ci.hcl overlay
Same build config runs in CI and AMI warm-up → cache key parity

3. Cache-From Configuration

Added merge-base commit to cache-from sources
Ensures cache hits for unchanged intermediate stages

4. EBS Throughput

Increased gp3 from 125 MB/s → 1000 MB/s
Speeds up cache import/export and registry push, cutting at least 1 minute

5. Network & Concurrency Tuning

Increased BuildKit/Docker parallelism
Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

6. Debug Script

Added debug-machine-config.sh to print instance info, network settings, and BuildKit config
Helps verify AMI optimizations are active in CI builds

Depends on vllm-project/vllm#31477 ## Context Without a warm local cache, builds download and extract ~18GB of layers from ECR: | Layer | Size | Download | Extract | Total | |--------------|---------|----------|---------|-------| | Extensions | 4.38 GB | 75s | 14s | 89s | | CUDA deps | 3.21 GB | 51s | 32s | 83s | | PyTorch | 5.15 GB | 97s | - | 97s | | Build tools | 2.30 GB | 38s | 18s | 56s | | Other layers | ~5 GB | ~70s | ~6s | 76s | | Total | ~18 GB | ~330s | ~70s | ~400s | This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change. ## Solution ### 1. Warm Local BuildKit Cache - Run buildkitd as persistent service with remote driver - Bake cache into custom AMIs, rebuilt daily from latest postmerge - Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day - Layers remain on disk between builds → skip download + extraction entirely - Saves: ~400s per cold-cache build ### 2. Docker Buildx Bake - CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay - Same build config runs in CI and AMI warm-up → cache key parity ### 3. Cache-From Configuration - Added merge-base commit to cache-from sources - Ensures cache hits for unchanged intermediate stages ### 4. EBS Throughput - Increased gp3 from 125 MB/s → 1000 MB/s - Speeds up cache import/export and registry push, cutting at least 1 minute ### 5. Network & Concurrency Tuning - Increased BuildKit/Docker parallelism - Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html ### 6. Debug Script - Added debug-machine-config.sh to print instance info, network settings, and BuildKit config - Helps verify AMI optimizations are active in CI builds Signed-off-by: Amr Mahdi <[email protected]>

Add docker/docker-bake.hcl as the base build configuration for docker buildx bake. This is part of the effort to bake Docker build cache into CPU AMIs for faster CI builds. See vllm-project/ci-infra#256 Context: CI builds need intermediate stages cached, not just base images. To make this work, we need the exact same build configuration to run in both: 1. Regular CI image builds 2. AMI cache warm-up during AMI creation Why docker buildx bake: Docker buildx bake provides a declarative, composable build configuration using HCL. CI extends this base config with a ci.hcl overlay that adds caching, registry settings, and commit-based cache keys. Both contexts use the same build definition, ensuring consistency and cache key parity for maximum layer reuse. This file provides: - Build variables (CUDA arch lists, parallelism settings) - Common build args shared across targets - OCI labels and annotations for image metadata (most importantly the git commit SHA) - Base targets (test, openai) that CI extends Usage example: docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci CI will download ci.hcl from ci-infra and combine it with this file. After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4x faster), more explanation the breakdown can be found in the ci-infra PR linked Signed-off-by: Amr Mahdi <[email protected]>

amrmahdi requested review from junpuf and khluu December 29, 2025 07:56

amrmahdi mentioned this pull request Dec 29, 2025

Add docker buildx bake configuration vllm-project/vllm#31477

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Warm BuildKit cache in AMIs for 2.4× faster CI builds #256

Warm BuildKit cache in AMIs for 2.4× faster CI builds #256

Uh oh!

amrmahdi commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Warm BuildKit cache in AMIs for 2.4× faster CI builds #256

Are you sure you want to change the base?

Warm BuildKit cache in AMIs for 2.4× faster CI builds #256

Uh oh!

Conversation

amrmahdi commented Dec 29, 2025

Context

Solution

1. Warm Local BuildKit Cache

2. Docker Buildx Bake

3. Cache-From Configuration

4. EBS Throughput

5. Network & Concurrency Tuning

6. Debug Script

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants