Skip to content

Parallelize PubGrub resolver's processInputs phases#10020

Open
pepicrft wants to merge 6 commits into
swiftlang:mainfrom
tuist:parallelize-pubgrub-process-inputs
Open

Parallelize PubGrub resolver's processInputs phases#10020
pepicrft wants to merge 6 commits into
swiftlang:mainfrom
tuist:parallelize-pubgrub-process-inputs

Conversation

@pepicrft
Copy link
Copy Markdown

@pepicrft pepicrft commented May 3, 2026

Parallelize PubGrub resolver's processInputs phases

Motivation:

PubGrubDependencyResolver.processInputs walks unversioned and revision-based top-level constraints sequentially, fetching each one's container and reading its dependencies before moving to the next. In production, each step is dominated by I/O (git fetch + manifest evaluation), so packages with several top-level dependencies pay an avoidable serial cost during resolution.

Modifications:

Wave-based parallelism in both phases of processInputs: each iteration drains all currently-known unversioned (or revision-based) constraints into a wave, fetches their containers and dependencies in parallel via withThrowingTaskGroup, and merges discoveries serially in wave order so that insertion into versionBasedDependencies and rootIncompatibilities matches the sequential implementation byte-for-byte. The override checks in the revision phase remain sequential because each one must observe the writes of its predecessors before deciding whether to skip or throw.

A new testResolutionIsDeterministicAcrossRuns resolves a mixed-phase graph 8 times and asserts identical bindings, catching race-induced ordering or skipped-constraint regressions.

Result:

Wide-fanout and mixed graphs (the common production shape) resolve 1.4×–1.9× faster, with the speedup widening as per-fetch latency grows. Deep-chain graphs (no parallelism available) are unchanged at realistic latencies.

Numbers were captured locally with a small benchmark harness (not part of this PR) that resolves synthetic graphs whose container methods sleep for a configurable per-call latency, modeling the git-fetch + manifest-evaluation cost the resolver pays in production. Driven by hyperfine (2 warmup + 5 measured runs) on arm64-apple-darwin25.4.0.

topology size latency (ms) before (ms) after (ms) speedup
wide-unversioned 10 5 266.0 155.3 1.71×
wide-unversioned 10 30 1371.1 779.4 1.76×
wide-unversioned 30 5 753.4 416.8 1.81×
wide-unversioned 30 30 4004.8 2104.5 1.90×
wide-revision 10 5 266.6 159.7 1.67×
wide-revision 10 30 1383.4 781.2 1.77×
wide-revision 30 5 752.4 416.9 1.80×
wide-revision 30 30 4030.8 2131.9 1.89×
mixed 10 5 253.9 177.8 1.43×
mixed 10 30 1363.5 860.5 1.58×
mixed 30 5 750.1 438.9 1.71×
mixed 30 30 4018.4 2186.0 1.84×
deep-unversioned 10 5 148.3 158.9 0.93×
deep-unversioned 10 30 778.1 771.0 1.01×
deep-unversioned 30 5 389.2 400.0 0.97×
deep-unversioned 30 30 2079.9 2100.1 0.99×
deep-revision 10 5 147.5 163.3 0.90×
deep-revision 10 30 765.2 768.9 1.00×
deep-revision 30 5 393.4 393.3 1.00×
deep-revision 30 30 2082.9 2070.9 1.01×

The 7-10% regressions on small deep graphs at low latency reflect withThrowingTaskGroup setup cost when each wave is a single constraint, and disappear at realistic latencies.

All existing PubGrub and PackageGraph tests pass.

pepicrft added 4 commits May 3, 2026 10:08
Wave-based parallelism in `PubGrubDependencyResolver.processInputs`:
each iteration drains all currently-known unversioned (or revision)
constraints, fetches their containers and dependencies in parallel via
`withThrowingTaskGroup`, and merges discoveries serially in wave order
so insertion into `versionBasedDependencies` and `rootIncompatibilities`
matches the sequential implementation byte-for-byte.

Yields ~1.7x-1.9x speedup on wide-fanout graphs at realistic per-fetch
latencies; deep-chain graphs are unchanged (no parallelism to exploit).

Adds 7 `testProcessInputs*` cases covering wide fanout, deep cascades,
mixed unversioned+revision graphs, cross-phase override, and an 8-run
determinism check that catches race-induced ordering changes.

Adds a `swift-package-resolver-bench` executable and
`Utilities/bench-resolver.sh` hyperfine driver for reproducing the
before/after numbers.
@plemarquand
Copy link
Copy Markdown
Contributor

@swift-ci test

Copy link
Copy Markdown
Contributor

@plemarquand plemarquand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this thoroughly researched and well written PR! I think this is a pretty clear win vs the current implementation. I had a few small comments, but overall I think this looks quite good.

I'd like @bripeticca to also take a look as she has some deeper experience with the PubGrub dependency resolver.

repeating: [],
count: wave.count
)
for try await (i, deps) in group {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sequential model the first error encountered in the wave was always thrown. Now that we're resolving in parallel there is the chance that the error thrown is non-deterministic if there is more than one error in the group. This could result in the user seeing an inconsistent error each time they resolve.

This is certainly an edge case, but it could add more confusion to an already annoying situation where you're battling dependency resolution errors.

At the cost of not failing as fast as possible we could captur the error for the item with the lowest index and throw that once the loop completes.

Specifically I'm thinking that deps elements become Result<[(DependencyResolutionNode, [Constraint]), Error>, and we add the unwrapped result value to the results[i], check if the Result is an Error and if so note the index and the error to throw after the loop finishes.

This same pattern applies when working with perConstraintDeps further down

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in 3770ac1: each task now returns its result wrapped in Result<>, the group collects them in wave-index order, and we throw via try $0.get() on the first failure encountered when iterating in order. So the failure with the lowest wave index always wins, regardless of which task happens to fail first. Applied to both phase 1 (unversioned) and phase 2 (revision).

results.append(
bindings
.map { "\($0.package.identity)@\($0.boundVersion)" }
.sorted()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting the bindings here masks the intent of the test; if the bindings are returned in a different order every time this difference is erased by the sort

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the sort was masking exactly what the test is supposed to detect. Removed in 3770ac1. The test now compares the rendered binding lists element-by-element so any reordering across runs surfaces.

Doing this turned out to expose a real pre-existing determinism issue: solve() sorts the version-decided packages before appending them to the bindings array, but iterates state.overriddenPackages as a plain Dictionary, which Swift documents as not having stable iteration order between equivalent instances. So unversioned and revision-based bindings were coming back in different orders across runs. Fixed in 3b18327 by sorting the overridden packages by deprecatedName, mirroring the treatment of decisions just above.


XCTAssertEqual(results.first?.count, 10, "Expected 10 bindings (a..d, va..vd, branch, vbranch)")
for (i, run) in results.enumerated() where i > 0 {
XCTAssertEqual(run, results[0], "Run \(i) bindings differ from run 0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how do we know that the first result is the correct order?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3770ac1: run 0 now also runs through AssertResult against an explicit expected binding list. Subsequent runs are still compared element-wise against run 0, so the test catches both "stable but wrong" and "correct but unstable" outcomes.

pepicrft added 2 commits May 4, 2026 12:42
- Wrap each task's result in `Result` and unwrap in wave order so the
  failure with the lowest wave index is thrown when multiple tasks
  fail. This restores the deterministic error reporting that the
  sequential implementation had.
- Drop the `.sorted()` from the determinism test's binding comparison
  so any race-induced reordering surfaces instead of being masked.
- Assert run 0's bindings explicitly with `AssertResult` so the test
  catches "stable but wrong" outcomes in addition to the existing
  "correct but unstable" check.
`solve()` already sorts the version-decided packages before appending
them to the bindings array, but iterates `state.overriddenPackages` as
a plain `Dictionary` which Swift explicitly documents as not having a
stable iteration order between equivalent instances. As a result the
unversioned and revision-based bindings could be emitted in different
orders across runs. Sorting by `deprecatedName` mirrors the existing
treatment of decisions and makes the full `bindings` array stable.

Surfaced by the `testResolutionIsDeterministicAcrossRuns` test once
its `.sorted()` shim was removed in the previous commit.
@plemarquand
Copy link
Copy Markdown
Contributor

@swift-ci test

@plemarquand
Copy link
Copy Markdown
Contributor

@swift-ci test windows

@vsarunas
Copy link
Copy Markdown
Contributor

vsarunas commented May 4, 2026

@pepicrft , I loaded up this branch under mitmproxy as done with a different approach to speed up PubGrub on this sample repo from #9845 and unfortunately do not see more parallelism of git requests:

tuist-parallel-2

@pepicrft
Copy link
Copy Markdown
Author

pepicrft commented May 5, 2026

You're right. processInputs only does I/O for .unversioned (path) and .revision (branch) top-level deps. Your reproducer is purely version-based, so phase 1 and 2 don't execute and there's nothing here to parallelize. The git fetches in your mitmproxy trace come from Workspace.loadDependencyManifests and the resolver's run() loop prefetch, both of which are already parallel.

My PR description overclaimed by calling wide-fanout the "common production shape". It's only common for projects with many top-level path or branch deps. #9942 targets the actual common case.

Happy to defer or close this if its scope is too narrow.

@vsarunas
Copy link
Copy Markdown
Contributor

vsarunas commented May 5, 2026

Thanks for the clarification!

Will defer to @plemarquand on how to proceed further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants