Skip to content

Conversation

@amrmahdi
Copy link
Collaborator

Depends on vllm-project/vllm#31477

Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

Layer Size Download Extract Total
Extensions 4.38 GB 75s 14s 89s
CUDA deps 3.21 GB 51s 32s 83s
PyTorch 5.15 GB 97s - 97s
Build tools 2.30 GB 38s 18s 56s
Other layers ~5 GB ~70s ~6s 76s
Total ~18 GB ~330s ~70s ~400s

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

Solution

1. Warm Local BuildKit Cache

  • Run buildkitd as persistent service with remote driver
  • Bake cache into custom AMIs, rebuilt daily from latest postmerge
  • Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day
  • Layers remain on disk between builds → skip download + extraction entirely
  • Saves: ~400s per cold-cache build

2. Docker Buildx Bake

3. Cache-From Configuration

  • Added merge-base commit to cache-from sources
  • Ensures cache hits for unchanged intermediate stages

4. EBS Throughput

  • Increased gp3 from 125 MB/s → 1000 MB/s
  • Speeds up cache import/export and registry push, cutting at least 1 minute

5. Network & Concurrency Tuning

6. Debug Script

  • Added debug-machine-config.sh to print instance info, network settings, and BuildKit config
  • Helps verify AMI optimizations are active in CI builds

Depends on vllm-project/vllm#31477

## Context

Without a warm local cache, builds download and extract ~18GB of layers from ECR:

| Layer        | Size    | Download | Extract | Total |
|--------------|---------|----------|---------|-------|
| Extensions   | 4.38 GB | 75s      | 14s     | 89s   |
| CUDA deps    | 3.21 GB | 51s      | 32s     | 83s   |
| PyTorch      | 5.15 GB | 97s      | -       | 97s   |
| Build tools  | 2.30 GB | 38s      | 18s     | 56s   |
| Other layers | ~5 GB   | ~70s     | ~6s     | 76s   |
| Total        | ~18 GB  | ~330s    | ~70s    | ~400s |

This ~6.5 minute overhead occurs on every cold-cache build for layers that rarely change.

## Solution

### 1. Warm Local BuildKit Cache

- Run buildkitd as persistent service with remote driver
- Bake cache into custom AMIs, rebuilt daily from latest postmerge
- Daily rebuilds ensure AMI cache stays fresh, maximizing cache hits throughout the day
- Layers remain on disk between builds → skip download + extraction entirely
- Saves: ~400s per cold-cache build

### 2. Docker Buildx Bake

- CI extends vLLM's docker-bake.hcl (vllm-project/vllm#31477) with ci.hcl overlay
- Same build config runs in CI and AMI warm-up → cache key parity

### 3. Cache-From Configuration

- Added merge-base commit to cache-from sources
- Ensures cache hits for unchanged intermediate stages

### 4. EBS Throughput

- Increased gp3 from 125 MB/s → 1000 MB/s
- Speeds up cache import/export and registry push, cutting at least 1 minute

### 5. Network & Concurrency Tuning

- Increased BuildKit/Docker parallelism
- Network tuning following https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

### 6. Debug Script

- Added debug-machine-config.sh to print instance info, network settings, and BuildKit config
- Helps verify AMI optimizations are active in CI builds

Signed-off-by: Amr Mahdi <[email protected]>
@amrmahdi amrmahdi requested review from junpuf and khluu December 29, 2025 07:56
amrmahdi added a commit to amrmahdi/vllm that referenced this pull request Dec 29, 2025
Add docker/docker-bake.hcl as the base build configuration for docker
buildx bake. This is part of the effort to bake Docker build cache into
CPU AMIs for faster CI builds. See vllm-project/ci-infra#256

Context:
CI builds need intermediate stages cached, not just base images. To make
this work, we need the exact same build configuration to run in both:
1. Regular CI image builds
2. AMI cache warm-up during AMI creation

Why docker buildx bake:
Docker buildx bake provides a declarative, composable build configuration
using HCL. CI extends this base config with a ci.hcl overlay that adds
caching, registry settings, and commit-based cache keys. Both contexts
use the same build definition, ensuring consistency and cache key parity
for maximum layer reuse.

This file provides:
- Build variables (CUDA arch lists, parallelism settings)
- Common build args shared across targets
- OCI labels and annotations for image metadata (most importantly the git
  commit SHA)
- Base targets (test, openai) that CI extends

Usage example:
docker buildx bake -f docker/docker-bake.hcl -f ci.hcl test-ci

CI will download ci.hcl from ci-infra and combine it with this file.

After completing the AMI cache baking in addition to the changes in the cin-infra repo, we expect image builds that only change leaf Python stages to go from ~17 minutes to ~7 minutes (~2.4x faster), more explanation the breakdown can be found in the ci-infra PR linked

Signed-off-by: Amr Mahdi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants