Skip to content

fix: total deploy time is not well-accounted for#1596

Merged
aws-cdk-automation merged 6 commits into
mainfrom
huijbers/measure-assets
Jun 5, 2026
Merged

fix: total deploy time is not well-accounted for#1596
aws-cdk-automation merged 6 commits into
mainfrom
huijbers/measure-assets

Conversation

@rix0rrr

@rix0rrr rix0rrr commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

We used to have the following spans:

  • COMMAND/INVOKE: a single span for the entire CLI invocation (synth + the entire deploy phase).
  • DEPLOY: a span per stack that gets deployed.

This does not allow easy accounting for total deploy time of all stacks (may be with or without parallelism), nor attribute time to assets which are built and deployed separately from stacks.

In this PR, making the following changes:

  • First, as a refactor, extract out the anonymous functions that do the actual work from the huge deploy() function in the CLI to a separate class. Unfortunately this leads to the need to copy a whole bunch of property values around, but the code becomes more clear.
    • Also unfortunately, this code had been copy/pasted to toolkit-lib and I haven't carried out the same refactoring there (because no need for telemetry there...) so there is now more divergence between these code paths.
  • This made it clear we were misprinting the "total time taken" statement, because that was printed per stack instead of for the entire operation.
  • Introduce the notion of "marker nodes" in the work graph, which get added for assets, before we start building the asset and after we have published it everywhere.
  • Marker nodes are linked to starting and ending spans.
  • Spans are linked to IoHost messages, which link to telemetry.

We record a bunch of additional counters and timers on the global INVOKE span as well, just because they will be easier to extract and graph that way when we look at total impact.

Adding to COMMAND span:

  • load: time spent loading CLI library, before calling first function
  • init: time spent running CLI initialization code, before starting user operation
  • totalDeployTime: wall clock time spent waiting for assets plus deploys (everything after synth).
  • fileAsset/dockerAsset: number of assets encountered
  • buildAssetST/publishAssetST: time spent on asset building/publishing, parallelism may end up with this higher than total CLI duration, indicated via ST suffix.
  • totalDeployedStacks: how many stacks deployed in this invocation
  • totalDeployedResources: how many resources deployed in this invocation (mind parallelism).

New ASSET span:

  • duration represents asset duration from build to last publish finishing.
  • fileAsset/dockerAsset: 1 on one or the other to indicate asset type.

ALSO in this PR:

  • Make timers Disposable so we can using them.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

We used to have the following spans:

- COMMAND/INVOKE: a single span for the entire CLI invocation (synth +
  the entire deploy phase).
- DEPLOY: a span *per stack* that gets deployed.

This does not allow easy accounting for total deploy time of all stacks
(may be with or without parallelism), nor attribute time to assets
which are built and deployed separately from stacks.

In this PR, making the following changes:

- First, as a refactor, extract out the anonymous functions that do
  the actual work from the huge `deploy()` function to a separate class.
  Unfortunately this leads to the need to copy a whole bunch of property
  values around, but the code becomes more clear.
- This made it clear we were misprinting the "total time taken"
  statement, because that was printed *per stack* instead of *for the
  entire operation*.
- Introduce the notion of "marker nodes" in the work graph, which get
  added for assets, before we start building the asset and after we have
  published it everywhere.
- Marker nodes are linked to starting and ending spans.
- Spans are linked to IoHost messages, which link to telemetry.

We record a bunch of additional counters and timers on the global
INVOKE span as well, just because they will be easier to extract and
graph that way when we look at total impact.

Adding to COMMAND span:

- `load`: time spent loading CLI library, before calling first function
- `init`: time spent running CLI initialization code, before starting
  user operation
- `totalDeployTime`: wall clock time spent waiting for assets plus
  deploys (everything after synth).
- `fileAsset/dockerAsset`: number of assets encountered
- `buildAssetST/publishAssetST`: time spent on asset
  building/publishing, parallelism may end up with this higher
  than total CLI duration, indicated via `ST` suffix.
- `totalDeployedStacks`: how many stacks deployed in this invocation
- `totalDeployedResources`: how many resources deployed in this invocation
  (mind parallelism).

New ASSET span:
- `duration` represents asset duration from build to last publish
  finishing.
- `fileAsset/dockerAsset`: `1` on one or the other to indicate asset
  type.

ALSO in this PR:

- Make timers `Disposable` so we can `using` them.
@rix0rrr rix0rrr requested a review from a team June 4, 2026 12:05
@github-actions github-actions Bot added the p2 label Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.51452% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.25%. Comparing base (3e1f708) to head (2b77638).

Files with missing lines Patch % Lines
packages/aws-cdk/lib/cli/cdk-toolkit.ts 85.27% 61 Missing and 1 partial ⚠️
packages/aws-cdk/lib/cli/io-host/cli-io-host.ts 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1596      +/-   ##
==========================================
+ Coverage   88.19%   88.25%   +0.06%     
==========================================
  Files          76       76              
  Lines       10864    11038     +174     
  Branches     1503     1528      +25     
==========================================
+ Hits         9581     9742     +161     
- Misses       1255     1267      +12     
- Partials       28       29       +1     
Flag Coverage Δ
suite.unit 88.25% <86.51%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants