Rejected commits in "Speed up the Scala compiler on the mill-libs-javalib codebase" by lihaoyi · Pull Request #26091 · scala/scala3

lihaoyi · 2026-05-19T07:15:58Z

Sister commit to #26025, containing all the rejected experiments, saved for posterity

…le cluster win Wraps `defn.asContextFunctionType(pt)` at Typer.scala:3925 with `if ctx.isAfterTyper then NoType else defn.asContextFunctionType(pt)`. The existing `!ctx.isAfterTyper` gate immediately below would discard the result post-typer anyway, so the dealias-driven probe is pure waste on Inlining / InlineTyper retyper paths. JFR profile deltas (3 repeats × 10 runs, mean ± stddev, baseline → opt-03): 1. asContextFunctionType:1968 (within Type.dealias) self%: 0.14 → 0.08 (-0.06, ~40% reduction at the targeted caller) 2. Type.dealias self%: 1.36 ± 0.02 → 1.39 ± 0.14 (+0.03, 0.2σ, within noise, run2 outlier) 3. Type.dealias tot%: 1.80 ± 0.04 → 1.75 ± 0.14 (-0.05, 0.3σ, within noise) 4. Typer.adapt1 tot%: 32.41 ± 0.13 → 32.99 ± 0.32 (+0.58, 1.3σ) Estimated total speedup: 0.06 ± 0.00 (from 1 above) Targeted line dropped as predicted (the result was demonstrably unused post-typer), but the win does not propagate to Type.dealias self/tot or the adapt1/typedUnadapted cluster — opt-03 stddev on dealias self ballooned from 0.02 to 0.14 driven by a run-2 outlier, and adapt1 tot drifted up slightly. Below the cluster-win threshold this branch's accepted commits have cleared. Reverted.

Add an identity short-circuit to TypeComparer.isSubArgs so hash-consed applied-type argument lists that are already the same list can skip the elementwise isSubArg and isSameType walk. The TypeComparer cluster moves upward instead of improving. JFR profile deltas (3 repeats × 8 runs, mean ± stddev, 08e2cbd → 85f830c): 1. TypeComparer.secondTry$1 tot%: 7.97 ± 0.05 → 8.20 ± 0.11 (+0.23, 2.1σ) 2. TypeComparer.recur tot%: 13.41 ± 0.17 → 13.63 ± 0.21 (+0.22, 1.0σ) 3. TypeComparer.firstTry$1 tot%: 13.31 ± 0.16 → 13.54 ± 0.23 (+0.23, 1.0σ) 4. TypeComparer.isSubType tot%: 13.57 ± 0.17 → 13.80 ± 0.16 (+0.23, 1.4σ) 5. TypeComparer.compareNamed$1 tot%: 3.57 ± 0.08 → 3.66 ± 0.15 (+0.09, 0.6σ) Estimated total speedup: -0.23 ± 0.00 (from 1 above) Rejected. The identity check either rarely fires on this workload or adds branch cost on a path where the list walk is already cheap.

…no measurable cluster win over opt-04 Short-circuits the doomed `tryBaseType` path in `TypeComparer`: when `tp1.symbol eq cls2`, `baseTypeOf` returns `tp1` unchanged (per SymDenotations.scala:2346), so `tryBaseType`'s `(base ne tp1)` test fails and the call falls through to `fourthTry` anyway. The guard fires after the AnyClass / SingletonClass early-returns and before `tryBaseType(cls2)`, redirecting to `fourthTry` directly via a `NamedType` match on `tp1`. JFR profile deltas (3 repeats × 10 runs, mean ± stddev, opt-04 → opt-05): 1. isSubType tot%: 15.13 ± 0.34 → 14.88 ± 0.07 (-0.25, 0.6σ, within noise) 2. firstTry$1 tot%: 14.71 ± 0.29 → 14.54 ± 0.12 (-0.17, 0.4σ, within noise) 3. compareNamed$1 tot%: 4.26 ± 0.03 → 4.16 ± 0.28 (-0.10, 0.3σ, opt-05 noisier) 4. tryBaseType$1 tot%: 3.50 ± 0.04 → 3.55 ± 0.16 (+0.05, 0.2σ, within noise) 5. baseTypeOf tot%: 2.79 ± 0.26 → 3.02 ± 0.37 (+0.23, 0.4σ, within noise) 6. recur tot%: 14.85 ± 0.30 → 14.68 ± 0.10 (-0.17, 0.4σ, within noise) Estimated total speedup: 0.25 ± 0.00 (from 1 above) Targeted method (tryBaseType$1) did not move and baseTypeOf trended slightly worse. Marginal cluster-wide drops are within inflated opt-05 std devs and not corroborated at the specific call site. The hit-rate of `tp1.symbol eq cls2` is apparently low enough that the always-taken pattern-match offsets gains on the few cases it catches. Reverted.

…yte, alloc shifted not reduced Added a per-LambdaType `myTrivialApplyState: Byte` cache: when `resType` is `AppliedType(tycon: TypeRef, args)` with `args` elementwise `eq` to `paramRefs`, `instantiate(argTypes)` returns `AppliedType(tycon, argTypes)` directly instead of walking `resultType.substParams(this, argTypes)`. Fast-path state is computed once per LambdaType on first instantiate call. JFR profile deltas (3 repeats × 10 runs, mean ± stddev, baseline → opt-06): 1. WeakHashSet$Entry[] share alloc%: 80.34 ± 0.88 → 80.70 ± 0.25 (+0.36, 0.3σ, baseline 219.50 ± 0.87 MiB → opt-06 219.83 ± 0.30 MiB; +0…) 2. Total alloc bytes alloc MiB: 274.64 → 273.61 (-1.03, -1.03 MiB, within noise) 3. Targeted `substParams:186` arm of `instantiateWithTypeVars → MethodOrPoly.instantiate` shrank 24.76% → 13.16% (-32 MiB), but allocation redistributed to sibling `subst:22` (+12 MiB) and `appliedTo:446` (+16 MiB) arms inside the same `derivedAppliedType` parent. Net effect on dominant WeakHashSet$Entry[] class is ~0 -- same hash-cons inserts, different callers. 4. Pattern matches the rejected e38f56b553 family ("allocation shifted, not reduced"). 5. CPU side showed small wins on Substituters$.substParams (-0.45) and TypeMap arms but below the iter-5 share floor. Estimated total speedup: 0.00 ± 0.00 (from 1 above) Reverting per iter-5 policy.

…assHandler, per-method % flat Bypassed the `Promise`/`Future.apply` dispatch chain in `WritingClassHandler.postProcessUnit` for the default `-Ybackend-parallelism=1` (`SyncWritingClassHandler`) path. Added a `protected def isSync` hook overridden in the sync subclass; the sync branch inlines `takeClasses()/takeTasty().foreach(sendToDisk)` on the caller and stores `Future.unit` (or `Future.failed(e)` on `Throwable`) so `complete()`'s `task.value.get.get` success/failure machinery and the `processingUnits += unitInPostProcess` submission-order semantics are preserved. Async branch retains the original `Future:` body verbatim. JFR profile deltas (3 repeats × 8 runs, mean ± stddev, 032d0ea → 3846bff): 1. WritingClassHandler.process tot%: 1.99 ± 0.08 → 2.02 ± 0.05 (+0.03, 0.2σ, within noise) 2. GenBCode.runOn tot%: 5.56 ± 0.13 → 5.50 ± 0.18 (-0.06, 0.2σ, within noise) 3. Object[] alloc share alloc%: 3.86 ± 0.33 → 3.42 ± 0.56 (-0.44, 0.5σ, within noise) 4. The dispatch frames (Promise.map → dispatchOrAddCallbacks → submitWithValue → Transformation.run) structurally vanished from analyze files; allocation forest lost every `Promise$Transformation` entry on the postProcessUnit path. Estimated total speedup: -0.03 ± 0.00 (from 1 above) Rejected: the body work (`sendToDisk` → ASM `ClassNode.accept` → `ClassWriter` constant-pool) dominates ~1.94% tot, so collapsing the wrapper saves only a sliver that does not register above noise. Per the iter-5 policy (per-method % delta only).

… under TypevarsMissContext Change candidateKind normalization for plain poly candidates under TypevarsMissContext to substitute param refs with bounded wildcards instead of allocating fresh throwaway type variables. The allocation-share row moves in the desired direction, but substitution CPU does not improve reliably. JFR profile deltas (3 repeats × 8 runs, mean ± stddev, 1ea0f89 → 8228627): 1. Substituters$.substParams tot%: 2.73 ± 0.29 → 2.87 ± 0.18 (+0.14, 0.5σ) 2. Substituters$.subst tot%: 8.78 ± 0.35 → 8.64 ± 0.05 (-0.14, 0.4σ) 3. WeakHashSet$Entry[] alloc share alloc%: 80.51 ± 0.44 → 70.95 ± 4.77 (-9.56, 2.0σ) Estimated total speedup: -0.14 ± 0.00 (from 1 above) Rejected. The allocation-share movement is high variance and does not outweigh the lack of a reliable substitution CPU win.

…ence asserts Replace equality-based List[Symbol] membership checks in TreeTypeMap.withSubstitution idempotence asserts with typed reference-equality helpers. The dispatch moves away from generic equality, but the original equals/list-exists cost was already near the reporting floor. JFR profile deltas (3 repeats × 8 runs, mean ± stddev, f771c9a → 7d48ef8): 1. scala.runtime.BoxesRunTime.equals2 self%: 0.47 ± 0.07 → 0.46 ± 0.03 (-0.01, 0.1σ) 2. scala.runtime.BoxesRunTime.equals2 tot%: 0.54 ± 0.09 → 0.53 ± 0.02 (-0.01, 0.1σ) 3. scala.collection.immutable.List.exists self%: 0.24 ± 0.02 → 0.23 ± 0.03 (-0.01, 0.2σ) 4. scala.collection.immutable.List.exists tot%: 0.91 ± 0.06 → 0.84 ± 0.06 (-0.07, 0.6σ) 5. TreeTypeMapEqHelpers$.listContainsEq self%: below floor → 0.11 ± 0.03 Estimated total speedup: 0.01 ± 0.00 (from 1 above) Rejected. The helper confirms the dispatch changed, but the affected rows remain too small and noisy to justify keeping the extra path.

… TypeProxy arm, testProvisional self flat Flips `this.mightBeProvisional = false` permanently on the non-provisional path of the `case tycon: TypeProxy` arm of `AppliedType.superType`. The conditions tested there (`tycon.isProvisional || args.exists(_.isProvisional)`) are exactly what `testProvisional`'s AppliedType arm walks via `t.fold(false, (x, tp) => x || test(tp, theAcc))`, so if neither fires the AppliedType is definitively not provisional and the bit can be cleared for all future isProvisional queries. JFR profile deltas (3 repeats × 10 runs, mean ± stddev, opt-04 → opt-06): 1. Type.testProvisional self%: 0.66 ± 0.15 → 0.66 ± 0.11 (+0.00, 0.0σ, within noise) 2. Type.testProvisional tot%: 0.79 ± 0.15 → 0.78 ± 0.13 (-0.01, 0.0σ, within noise) 3. asSeenFromSlow self%: 0.12 ± 0.09 → 0.09 ± 0.04 (-0.03, 0.2σ, -0.03, within noise) 4. asSeenFromSlow tot%: 9.94 ± 0.16 → 10.25 ± 0.60 (+0.31, 0.4σ, within noise) Estimated total speedup: 0.00 ± 0.00 (from 1 above) testProvisional did not move. The targeted superType:4792 slice is unchanged across samples. Likely the bit-flip is already performed by testProvisional itself whenever the walk enters an AppliedType, and the post-superType `validSuper == ctx.period` cache already short-circuits within-phase queries. The per-AppliedType compound win predicted by the verifier did not materialize.

…ClassesCollector instance via ThreadLocal, alloc-share noise-level only Replaced the per-call anonymous `NestedClassesCollector[ClassBType]` allocation (plus its two fresh `mutable.Set` instances) in `collectNestedClasses` with a single per-thread reusable instance hoisted via `ThreadLocal`. `sendToDisk` runs on multiple worker threads under `-Ybackend-parallelism > 1`, so the hoist must be per-thread. The single caller (`PostProcessor.setInnerClasses`) materializes both returned iterables synchronously into `addInnerClasses` before the next per-thread call, so the consume-immediately reuse is safe. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, afd9028 → 5cd2aa1): 1. NestedClassesCollector.visitDescriptor self%: 0.11 ± 0.02 → 0.10 ± 0.05 (-0.01, 0.1σ, within noise, noise widened) 2. Object[] alloc share alloc%: 5.54 ± 1.00 → 5.85 ± 0.64 (+0.31, 0.2σ, within noise, std subsumes) 3. alloc bytes / run alloc MiB: 173.00 → 166.00 (-7.00, -4 mean, within noise) Estimated total speedup: 0.01 ± 0.00 (from 1 above) Rejected: per-class collector allocation is genuinely small (a few hundred allocations per compile, ~64-128 B each) and falls below the bench floor. Verifier predicted "0.05-0.10% tot" and noted the savings would be in allocation count, not the cited self row; both held, but neither separated from noise.

…n plain poly givens For plain poly implicit candidates, add a derivesFrom-based class-hierarchy prefilter so unrelated result-type classes can be rejected before normalize, isCompatible, substParams, and AppliedUniques insertion. The intended allocation reduction does not separate from variance, and the targeted AppliedUniques/substitution rows stay flat. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, 112f041 → 706e3c9): 1. WeakHashSet$Entry[] alloc alloc%: 72.14 ± 2.18 → 69.74 ± 1.90 (-2.40, 0.6σ) 2. AppliedUniques.linkedListLoop$2 self%: 3.15 ± 0.08 → 3.14 ± 0.09 (-0.01, 0.1σ) 3. Substituters$.substParams self%: 0.22 ± 0.02 → 0.22 ± 0.02 (+0.00, 0.0σ) 4. Type.dealias self%: 0.57 ± 0.12 → 0.60 ± 0.06 (+0.03, 0.2σ) Estimated total speedup: 0.01 ± 0.00 (from 2 above) Rejected. The class-hierarchy prefilter is plausible, but the allocation signal is noisy and the direct CPU rows do not show a reliable mill-javalib win.

…ss val, below-floor Hoisted the per-DefDef `claszSymbol.is(Trait)` check in `gen` out of the per-method loop into a per-class `isCZTrait` flag, mirroring the existing `isCZParcelable` / `isCZStaticModule` fields. The genuine `SymDenotation.is` call is loop-invariant across all DefDefs of a given class, so we can pay it once at the top of `genPlainClass`. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, 2b355ab → 96475fb): 1. SymDenotation.is self%: 0.88 ± 0.11 → 0.92 ± 0.03 (+0.04, 0.3σ, within noise) 2. SymDenotation.is tot%: 1.78 ± 0.12 → 1.85 ± 0.08 (+0.07, 0.4σ, within noise) 3. GenBCode.runOn tot%: +0.04% mean, within noise 4. BCodeSkelBuilder.genPlainClass tot%: -0.01% (noise) Estimated total speedup: -0.04 ± 0.00 (from 1 above) Rejected: SymDenotation.is is already heavily C2-inlined at hot sites; the saved virtual-dispatch byte per DefDef vs per class is below the JFR 3-run methodology's resolution.

…mmon type shapes Split the 25+ arm Type-shape match in TypeMap.mapOver into a hot dispatch (NamedType/AppliedType/LambdaType/ThisType|BoundType|NoPrefix/AliasingBounds/ TypeBounds/TypeVar/ExprType) plus a cold `mapOverRare` fallback covering CapturingType, AnnotatedType, ProtoType, RefinedType, RecType, SuperType, ConstantType(Type), LazyRef, ClassInfo, AndType, OrType, FlexibleType, MatchType, SkolemType. The TermRef.isImport guard is preserved inside the hot NamedType arm (it shadowed the generic NamedType match in the original). AliasingBounds stays above the TypeBounds arm to preserve dispatch identity. The hypothesis was that shrinking the hot tableswitch would let JIT inline mapOver into hot TypeMap subclasses (AsSeenFromMap.apply, Substituters, TreeTypeMap.computeMapType). JFR profile deltas (5 repeats × 10 runs, mean ± stddev, baseline → opt-1): 1. TypeMap.mapOver self%: 0.64 ± 0.05 → 0.52 ± 0.06 (-0.12, 1.1σ) 2. TypeMap.mapOver tot%: 13.93 ± 0.52 → 13.81 ± 0.35 (-0.12, 0.1σ) 3. TypeMap.mapOverRare self%: (new) → 0.16 ± 0.04 4. TypeMap.mapOverRare tot%: (new) → 1.27 ± 0.12 5. AsSeenFromMap.apply tot%: 8.95 ± 0.54 → 8.73 ± 0.36 (-0.22, 0.2σ) 6. TypeMap.mapOverLambda tot%: 11.13 ± 0.43 → 10.79 ± 0.25 (-0.34, 0.5σ) 7. Denotation.info self%: 1.98 ± 0.52 → 1.65 ± 0.64 (-0.33, 0.3σ) Estimated total speedup: 0.12 ± 0.00 (from 1 above) The mapOver self drop (0.64 → 0.52) is the largest movement but fails the (new_mean + 1.5*std) < (old_mean - 1.5*std) gate: new_high=0.610 vs old_low=0.565 (overlap of 0.045). mapOver tot is flat. Downstream clusters (AsSeenFromMap.apply, mapOverLambda) drift in the right direction but none cross the noise.

… hot Tree case classes Introduced a `protected def envelopeImpl(src, startSpan): Span` hook in `Positioned` carrying the productElement-driven `include` walk, and overrode it on the eight hottest Tree case classes (Select, Apply, TypeApply, Block, If, ValDef, DefDef, Template) to walk their typed children directly via four `private[ast]` helpers on `Positioned` (`includePositioned`, `includePositionedList`, `includePositionedListList`, `includeLazyTree`/`includeLazyTreeList`). The base method's two-pass MaxOffset retry and the short-circuit / Inlined.call special cases are preserved verbatim — only the body of the third arm is now per-class. Goal: eliminate productElement boxing of the Tree child references, the megamorphic `Positioned | Modifiers | ::[?] | _` match, and the recursive `case y :: ys` arm for list children on the path through `Positioned.include$1` (opt-2 baseline self 1.31 +- 0.08). JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-2 → opt-3): 1. Positioned.include$1 (opt-2 self/tot) self%: 1.31 ± 0.08 → (below floor) 2. Positioned$.includePositioned (opt-3 self/tot) self%: (below floor) → 1.06 ± 0.07 3. Positioned$.includePositionedList (opt-3 self/tot) self%: (below floor) → 0.12 ± 0.02 4. Positioned.envelope tot%: 1.88 ± 0.11 → 1.71 ± 0.09 (-0.17, 0.8σ, opt-2 self 0.29±0.04; opt-3 self 0.20±0.04) 5. TreeMap.transform tot%: 29.77 ± 0.39 → 29.56 ± 0.41 (-0.21, 0.3σ, opt-2 self 0.80±0.07; opt-3 self 0.79±0.09) 6. Cluster-self (include* helpers): 1.31 +- 0.08 → 1.18 +- 0.07 (combined includePositioned + includePositionedList). Per the stat-sig rule (new_mean + 1.5*new_std) < (old_mean - 1.5*old_std): 1.29 < 1.19 is FALSE — overlap. envelope tot% drop 1.88 → 1.71: new_hi 1.845 > old_lo 1.715, also within noise. Estimated total speedup: 0.17 ± 0.00 (from 4 above) Rejected: the percentage-drop appears to be a denominator artifact of opt-3 sampling more total CPU work overall; the absolute work in the envelope cluster did not measurably shrink. The reflective productElement loop is apparently well-optimised by C2 at this hot site, and the indirection cost of an extra virtual call through `envelopeImpl` plus the four helper-method frames roughly cancels the saving from the inlined typed walks.

…thirdTry(Named)/fourthTry/tryBaseType/compareAppliedType1+2/isSubArgs Mirroring the existing pattern in compareNamed (line ~301-302), hoist `val ctx = comparerContext; given Context = ctx` to the top of each case-cascade method so the dispatch path fetches the implicit Context once instead of per case-arm. Behavior unchanged: comparerContext is a trivial getter for myContext, and the resulting `given` shadows the class-level `given [DummySoItsADef]: Context = myContext` with the same value. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-2 baseline → opt-4): 1. firstTry$1 self%: 0.34 ± 0.05 → 0.28 ± 0.06 (-0.06, 0.5σ, within noise) 2. firstTry$1 tot%: 14.14 ± 0.23 → 14.48 ± 0.22 (+0.34, 0.8σ) 3. secondTry$1 self%: 0.19 ± 0.03 → 0.21 ± 0.02 (+0.02, 0.4σ, within noise) 4. secondTry$1 tot%: 7.53 ± 0.21 → 7.91 ± 0.13 (+0.38, 1.1σ) 5. thirdTryNamed$1 self%: 0.12 ± 0.05 → 0.10 ± 0.04 (-0.02, 0.2σ, within noise) 6. thirdTryNamed$1 tot%: 3.09 ± 0.11 → 3.21 ± 0.08 (+0.12, 0.6σ) 7. recur self%: 0.12 ± 0.03 → 0.16 ± 0.02 (+0.04, 0.8σ, within noise, borderline up) 8. recur tot%: 14.27 ± 0.25 → 14.62 ± 0.24 (+0.35, 0.7σ) 9. isSubType self%: 0.37 ± 0.03 → 0.38 ± 0.02 (+0.01, 0.2σ, within noise) 10. Context.gadt self%: 0.23 ± 0.02 → 0.21 ± 0.03 (-0.02, 0.4σ, within noise) 11. fourthTry/compareAppliedType1+2/isSubArgs/tryBaseType all below 0.10% self floor in opt-4 (tryBaseType was 0.10 in opt-2). Estimated total speedup: 0.06 ± 0.00 (from 1 above) No cluster row drops stat-sig per the (new+1.5*std) < (old-1.5*std) rule. percentages are relative so still comparable, but no clear win. Rejecting per the policy that requires at least one stat-sig drop without material regression.

Mirror of accepted 1f35a2a (which widens checkedPeriod to lastd.validFor on a fast-path hit), applied to the recomputeDenot slow-path so the gate widens after a fresh denotation is computed rather than staying narrow at ctx.period. One-line: Hypothesis: widening on the recompute path lets the next Symbol.denot short-circuit at the new denotation's full validity window instead of re-entering goBack/goForward at every phase boundary inside that window. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-2 → opt-5): 1. SingleDenotation.goBack$1 self%: 1.30 ± 0.17 → 2.20 ± 0.44 (+0.90, 1.5σ, +69, clear regression) 2. Denotation.info self%: 1.26 ± 0.33 → 1.39 ± 0.69 (+0.13, 0.1σ, within noise, high variance) 3. Symbol.denot self%: 1.31 ± 0.14 → 1.22 ± 0.08 (-0.09, 0.4σ, within noise) 4. SingleDenotation.goForward$1 self%: 0.55 ± 0.03 → 0.53 ± 0.02 (-0.02, 0.4σ, within noise) 5. Symbol.computeDenot self%: below floor in both summaries Estimated total speedup: -0.90 ± 0.00 (from 1 above) The change widens checkedPeriod to newd.validFor after a recompute, but newd.validFor at recompute time is typically the phase-narrow "current denotation in this single phase" interval, not the wider flock window. The wider-gate hypothesis only fires when validFor already spans multiple phases, which is not the common case on the recompute branch. Meanwhile, widening to a phase-narrow validFor appears to confuse downstream callers that re-enter goBack when the gate doesn't cover the queried period, doubling-up traversal cost in goBack$1. Distinct from accepted 1f35a2a: that one widens AFTER a successful fast-path hit where validFor is already the wider lastd.validFor. Here we widen AFTER a recompute where newd.validFor is the narrow just-computed phase interval. The mirroring is structural but not semantic.

…llel arrays double alloc Replaced the chained-bucket Entry linked list with open-addressed linear probing over parallel `refs: Array[Entry[A] | Null]` + `hashes: Array[Int]` arrays, with a shared Tombstone sentinel for deletions/GC reclamation. Inlined the open-addressed put loop into NamedTypeUniques.enterIfNew and AppliedUniques.enterIfNew (the two hottest call sites). JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-2 (baseline) → opt-6 (this attempt)): 1. alloc bytes/run alloc MiB: 335.00 → 613.00 (+278.00, +83 regression) 2. Entry[] share alloc%: 81.96 → 44.52 (-37.44, -37%; fails 50% bar) 3. Entry[] bytes alloc MiB: 274.00 → 273.00 (-1.00) 4. int[] share alloc%: 1.41 → 46.35 (+44.94, NEW hashes[] alloc) 5. int[] bytes alloc MiB: 4.70 → 284.00 (+279.30) 6. NamedTypeUniques.linkedListLoop$1 / NamedTypeUniques.enterIfNew self%: 1.29 → 1.41 (+0.12, within noise) 7. AppliedUniques.linkedListLoop$2 / AppliedUniques.enterIfNew self%: 0.54 → 0.14 (-0.40, better) 8. WeakHashSet.linkedListLoop$2 self%: 0.69 ± 0.02 → (below floor) 9. WeakHashSet.addEntryAt self%: 0.26 ± 0.18 → 0.19 ± 0.16 (-0.07, 0.2σ) 10. WeakHashSet.putHashed self%: (new) → 0.13 ± 0.01 Estimated total speedup: -0.12 ± 0.00 (from 6 above) Root cause: the parallel `hashes: Array[Int]` doubles array-allocation pressure on every resize, so the absolute Entry[] bytes barely move (273 vs 274 MiB) while int[] grows by ~280 MiB. The Entry[] *share%* drops only because total alloc nearly doubled. Even if the hashes[] are off-heap small (4 bytes per slot) the per-resize twin-array allocation kills the win and adds load-store traffic on every probe. CPU on NamedType primary path is also slightly worse, so we don't gain a probe-speedup either. A future attempt should pack the cached hash into the Entry object itself (it already has one as `val hash`) and probe over a single Entry[] array, OR retain the Entry chain but skip the slot in the bucket head when stale — preserving the single-array allocation.

Modified `resize()` to walk the old table twice: first pass counts live (non-stale) entries via `e.refersTo(null)`; second pass copies only live entries and rebuckets. If the live count still fits the current capacity under loadFactor, the new table is allocated at the same size (no doubling), saving the would-be growth. Drains the ReferenceQueue at the end since we've already removed the corresponding entries. Hypothesis: by purging GC'd entries at resize time and skipping growth when the live set still fits, Entry[] alloc share would drop from 82% to 70-75% range with negligible CPU cost. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, baseline → opt-2): 1. Entry[] alloc share alloc%: 82.11 ± 0.37 → 81.67 ± 1.02 (-0.44, 0.3σ, std nearly 3x) 2. Entry[] alloc bytes alloc MiB: 274.30 ± 1.10 → 275.20 ± 1.89 (+0.90, 0.3σ, std ~2x) 3. ReferenceQueue.poll tot%: 0.84 ± 0.06 → 0.91 ± 0.10 (+0.07, 0.4σ, within noise) 4. WeakHashSet.resize self%: (new) → 1.20 ± 0.05 5. AppliedUniques.linkedListLoop$2 self%: 0.80 ± 0.23 → 0.97 ± 0.13 (+0.17, 0.5σ) 6. AppliedUniques.linkedListLoop$2 tot%: 1.93 ± 0.08 → 2.30 ± 0.11 (+0.37, 1.9σ) 7. NamedTypeUniques.linkedListLoop$1 self%: 1.64 ± 0.07 → 1.52 ± 0.22 (-0.12, 0.4σ, std tripled) Estimated total speedup: -0.17 ± 0.00 (from 5 above) Root cause: the targeted Entry[] alloc share barely moved (within noise), and the absolute MiB went up, not down. The first-pass `refersTo(null)` chain walk added a visible new hot leaf (`WeakHashSet.resize` at 1.20% self), and the rebucketed chains landed in an order that slowed `AppliedUniques.linkedListLoop$2` (the consumer of WeakHashSet chains on the AppliedType uniques path). NamedTypeUniques.linkedListLoop$1 std tripled, signaling layout-induced variance. Net: added CPU cost without the alloc savings the proposal promised. A future attempt should either (a) prune stale entries lazily during `linkedListLoop` rather than at resize time, avoiding the extra full walk, or (b) cap WeakHashSet growth via a separate non-resize "compact" operation triggered by the ReferenceQueue drain on the hot put paths themselves.

Add DenotingTree.symbol typed-arm overrides that return NamedType.symbol or ThisType.cls directly, with matching Select and ProxyTree overrides for the ConstantType and module-term cases. The denot.symbol frames vanish, but direct NamedType.symbol calls increase computeSymbol, Symbol.denot, and toDenot work, and Typer.typed regresses. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-1 4ab7fca → opt-4): 1. Tree.symbol tot%: 3.35 ± 0.09 → below floor 2. DenotingTree.symbol self%: new → 0.31 ± 0.03 3. Select.symbol self%: new → 0.25 ± 0.14 4. ProxyTree.symbol self%: new → 0.12 ± 0.02 5. NamedType.symbol tot%: 0.99 ± 0.04 → 1.42 ± 0.09 (+0.43, 3.3σ) 6. NamedType.denot tot%: 4.05 ± 0.29 → 2.31 ± 0.03 (-1.74, 5.4σ) 7. Symbol.denot tot%: 3.46 ± 0.16 → 4.78 ± 0.24 (+1.32, 3.3σ) 8. Symbols$.toDenot tot%: 2.96 ± 0.17 → 4.20 ± 0.17 (+1.24, 3.6σ) 9. Typer.typed tot%: 70.79 ± 0.58 → 71.51 ± 0.43 (+0.72, 0.7σ) Estimated total speedup: -0.72 ± 0.00 (from 9 above) Rejected. The old denot.symbol path appears to use a cheaper current-period denotation cache than the direct NamedType.symbol route, so the override is a reattribution loss rather than a symbol-read win.

Hoist the transformCtx source-check path into TreeMap.transform for non-MemberDef shapes so unchanged-source trees can stay in the current context without a transformCtx call. The change preserves the MemberDef context path and the existing transform body. The transformCtx frame disappears, but the source check and shape dispatch move into TreeMap.transform self time. The combined row is neutral and the TreeMap total does not show a reliable win. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-3 29b7860 → opt-5): 1. Trees$Instance.transformCtx self%: 0.20 ± 0.02 → below floor 2. Trees$Instance.transformCtx tot%: 0.38 ± 0.03 → below floor 3. TreeMap.transform self%: 0.78 ± 0.12 → 1.00 ± 0.09 (+0.22, 1.0σ) 4. TreeMap.transform tot%: 28.73 ± 0.28 → 29.27 ± 0.62 (+0.54, 0.6σ) 5. Combined transformCtx self + TreeMap.transform self%: 0.98 → 1.00 (+0.02, within noise) 6. MegaPhase.transformTree tot%: 12.88 ± 0.21 → 12.82 ± 0.29 (-0.06, 0.1σ) 7. TreeAccumulator.foldOver self%: 1.22 ± 0.08 → 1.21 ± 0.06 (-0.01, 0.1σ) Estimated total speedup: -0.22 ± 0.15 (from 3 above) Rejected. The removed transformCtx frame is reabsorbed into TreeMap.transform self time, the combined self row is effectively flat, and no caller row provides a compensating improvement.

…on-Type ConstantType Intent: short-circuit `mapType` before consulting the EqHashMap cache for trivial leaf types (NoType, NoPrefix, and ConstantType whose `.value.value` is not a Type) that would otherwise bottom out at mapOver's catch-all. The hypothesis was that Literal/error-tree visits churn the cache or pay the typeMap/substMap virtual dispatch unnecessarily; an early `return tp` would dodge both. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-3 (29b7860) → opt-6): 1. EqHashMap.lookup self%: 1.38 ± 0.39 → 1.44 ± 0.37 (+0.06, 0.1σ) 2. EqHashMap.lookup tot%: 1.42 ± 0.38 → 1.47 ± 0.37 (+0.05, 0.1σ) 3. TreeTypeMap.computeMapType self%: 0.22 ± 0.05 → 0.21 ± 0.05 (-0.01, 0.1σ) 4. TreeTypeMap.computeMapType tot%: 3.48 ± 0.24 → 3.40 ± 0.14 (-0.08, 0.2σ) 5. TreeTypeMap.transform self%: 0.22 ± 0.04 → 0.22 ± 0.03 (+0.00, 0.0σ) 6. TreeTypeMap.transform tot%: 6.98 ± 0.25 → 6.92 ± 0.21 (-0.06, 0.1σ) 7. TreeTypeMap.mapType self%: below 0.05% floor in both summaries (invisible) 8. All targeted deltas fall well within noise under the 1.5σ combined-SE bar (e.g. EqHashMap.lookup self combined SE ~0.24% needs ~0.36% to clear; computeMapType tot SE ~0.12% needs ~0.19%). The most notable directional move is a small *regression* on EqHashMap.lookup, opposite to the predicted direction. No improvement signal anywhere. Estimated total speedup: -0.06 ± 0.00 (from 1 above) Net: an extra branch on every mapType call (the hot path on the cached side) with no profile-detectable payoff. Not worth keeping.

Extend the TreeMap and MegaPhase leaf shortcuts to Super and EmptyValDef, treating those leaf-equivalent shapes like the existing no-child cases. The change keeps the normal transform path for all other trees and only bypasses wrapper/context work for these two shapes. The candidate shapes are rare on this workload. The added Super and EmptyValDef checks on the dominant non-leaf path offset the small amount of avoided wrapper work. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, opt-3 29b7860 → opt-7): 1. TreeMap.transform self%: 0.78 ± 0.12 → 0.86 ± 0.09 (+0.08, 0.4σ) 2. TreeMap.transform tot%: 28.73 ± 0.28 → 28.79 ± 0.37 (+0.06, 0.1σ) 3. Trees$Instance.transformCtx self%: 0.20 ± 0.02 → 0.21 ± 0.05 (+0.01, 0.1σ) 4. MegaPhase.transformTree self%: 0.64 ± 0.11 → 0.56 ± 0.09 (-0.08, 0.4σ) 5. MegaPhase.transformTree tot%: 12.88 ± 0.21 → 13.11 ± 0.18 (+0.23, 0.6σ) 6. MegaPhase.transformUnnamed$1 tot%: 12.88 ± 0.21 → 13.10 ± 0.17 (+0.22, 0.6σ) 7. MegaPhase.transformNamed$1 tot%: 12.74 ± 0.21 → 12.99 ± 0.17 (+0.25, 0.7σ) 8. TreeAccumulator.foldOver self%: 1.22 ± 0.08 → 1.20 ± 0.08 (-0.02, 0.1σ) Estimated total speedup: -0.08 ± 0.15 (from 1 above) Rejected. The only favorable row is a noisy MegaPhase self drop, while TreeMap self and MegaPhase total rows drift upward. Super and EmptyValDef do not appear often enough to pay for the extra checks.

…f[LazyType]` short-circuit ahead of isCurrent Inlined the no-LazyType arm of `isCurrent` directly into each `is`/`isOneOf`/`isAllOf` (single- and two-arg variants) on `SymDenotation`: Reasoning: `isCurrent` is a `def` (not `final`), with a body that calls through to a nested `knownFlags` match. For the overwhelmingly common already-completed case the body short-circuits on `!myInfo.isInstanceOf[LazyType]`; hoisting that test inline at each caller should eliminate the `isCurrent` method frame on the hot path while leaving the slow LazyType arm bit-identical. This is the *inverse* of `a192788039` (which added a `myCompleted: Boolean` field that grew `isCurrent` past inline budget): here no field is added, only the leading short-circuit is duplicated at six call sites. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, baseline a9e06cf → opt-6): 1. SymDenotation.is self%: 0.95 ± 0.07 → 0.98 ± 0.06 (+0.03, 0.2σ, within noise) 2. SymDenotation.is tot%: 1.91 ± 0.08 → 1.94 ± 0.06 (+0.03, 0.2σ, within noise) 3. SymDenotation.isAllOf self%: 0.24 ± 0.05 → 0.28 ± 0.10 (+0.04, 0.3σ, within noise) 4. SymDenotation.isAllOf tot%: 0.29 ± 0.04 → 0.37 ± 0.12 (+0.08, 0.5σ, within noise) 5. SymDenotation.completeFrom self%: 1.09 ± 0.63 → 1.94 ± 0.52 (+0.85, 0.7σ, regression) 6. SymDenotation.completeFrom tot%: 16.04 ± 0.42 → 17.50 ± 0.70 (+1.46, 1.3σ, regression) 7. Denotation.info tot%: 17.26 ± 0.31 → 18.36 ± 0.39 (+1.10, 1.6σ, regression) 8. Symbol.denot self%: 1.22 ± 0.03 → 1.54 ± 0.12 (+0.32, 2.1σ, regression) 9. isCurrent: not visible in either profile (already inlined both sides). Estimated total speedup: -0.03 ± 0.00 (from 1 above) The mechanism backfired. Most likely the duplicated `isInstanceOf[LazyType]` prefix at six different `is`/`isOneOf`/`isAllOf` sites grew each method's bytecode just enough that HotSpot stopped inlining them into their many hot callers. The proposal's premise that "the body shrinks" is wrong in aggregate: each of the six methods gained one branch, and the cumulative effect across the call graph is worse than the original `isCurrent` thunk it was supposed to avoid.

…m/substParams Symmetric extension of the shipped substThis NamedType isStaticOwner gate (c7b69fb / Substituters.scala line 127) to the three list-input Substituter entry points whose NamedType arms perform a linear from-list scan (or IdentityHashMap probe) plus a recursive prefix walk: For each, an early-return on `tp.currentSymbol.isStaticOwner` skips the per-NamedType work. Correctness rests on: callers (InlineReducer / UnrollDefinitions / TypeAssigner / Symbols.copy / PolyType.instantiate / TypeOps.refineUsingParent) only put type/value-parameter or local-binding symbols in `from`, never static owners; for substParams `from: BindingType` is never a Symbol so the gate is purely about elision of the recursive prefix walk. The prefix chain of a static-owner NamedType consists only of static owners and bottoms out at NoPrefix, containing no candidate `from` references. Disjoint from the rejected c16ba84, which gated only single-binder entries (subst[BT] / substRecThis / substParam). This proposal targets the list-input entries only. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-6/baseline 63bd3c908e → iter-6/opt-2): 1. Substituters$.subst tot%: 2.40 ± 0.14 → 2.15 ± 0.10 (-0.25, 1.0σ) 2. Substituters$.subst self%: 0.32 ± 0.07 → 0.29 ± 0.04 (-0.03, 0.3σ) 3. Substituters$.substSym tot%: 1.13 ± 0.08 → 1.08 ± 0.14 (-0.05, 0.2σ) 4. Substituters$.substSym self%: 0.28 ± 0.03 → 0.32 ± 0.05 (+0.04, 0.5σ, gate cost) 5. Substituters$.substParams tot%: 2.77 ± 0.17 → 3.01 ± 0.12 (+0.24, 0.8σ, regression) 6. Substituters$.substParams self%: 0.19 ± 0.03 → 0.22 ± 0.02 (+0.03, 0.6σ) 7. Type.subst tot%: 2.50 ± 0.13 → 2.26 ± 0.10 (-0.24, 1.0σ) 8. NamedType.derivedSelect tot%: 4.60 ± 0.14 → 4.31 ± 0.20 (-0.29, 0.9σ) 9. TypeMap.mapOver tot%: 13.64 ± 0.36 → 13.24 ± 0.51 (-0.40, 0.5σ) 10. TypeOps$AsSeenFromMap.apply tot%: 8.97 ± 0.31 → 8.28 ± 0.53 (-0.69, 0.8σ) 11. SymDenotation.isStatic self%: 0.38 ± 0.21 → 0.43 ± 0.21 (+0.05, 0.1σ, gate cost surfacing) 12. Typer.typed tot%: 70.66 ± 0.32 → 70.93 ± 0.74 (+0.27, 0.3σ, within noise) 13. Aggregate Substituters-cluster tot delta (subst + substSym + substParams + Type.subst) is -0.30% but offset by the +0.24% substParams regression and well within combined std (σ ~0.26). The favorable rows (subst, NamedType.derivedSelect, AsSeenFromMap, mapOver) all sit at ~1.1–1.5σ — short of a multi-σ signal — while substParams worsens by a comparable margin in the opposite direction, suggesting the saved linear-scan + prefix-recursion work is roughly balanced by the per-call `currentSymbol.isStaticOwner` flag-bit cost and by SubstParamsMap calls where the gate seldom fires (substParams `from` is a BindingType, so the linear-scan saving doesn't apply — only the prefix-walk elision does, and the gate-cost surfaces uniformly). Estimated total speedup: 0.03 ± 0.00 (from 2 above) This matches the pattern observed in the recent rejection 557774f284 of essentially the same shape (bundled with the single-binder subst[BT]): favorable rows tot drops within 1σ, substParams or related-row regressions of comparable magnitude, no top-level Typer.typed win. The narrowed list-input-only shape (without the subst[BT] portion that c16ba84 already rejected) does not change the verdict.

…safe invalidation linkage) Mirror attempt of the shipped `1f5992a645` (Symbol-side `Symbol.denot` gate widen + `computeDenot` validFor widen on hit). Applied the analogous shape to NamedType: Result: the modified compiler builds, but immediately aborts on bootstrap compilation with i.e. `tree.symbol == NoSymbol` for a NameTree that previously resolved to a real symbol. Test corpus: bench-mill-javalib (Mill 1.1.6 modules) failed in phase 1 of 10 runs (rc=1). Root cause (distinct from the Symbol-side analogue): `Denotations.validFor_=` calls `symbol.invalidateDenotCache()` which resets `Symbol.checkedPeriod` -- but NamedType.checkedPeriod has no analogous invalidation hook. So when a chain entry's `validFor` is narrowed (e.g. by a later `derivedSingleDenotation` splitting the JointRefDenotation chain), NamedType's widened `checkedPeriod` is not invalidated and the stale `lastSymbol` is returned. The comment at Types.scala:2425 ("SymDenotation#installAfter never changes the symbol") is only true for SymDenotation chain entries. For NonSymSingleDenotation (Joint/UniqueRefDenotation) chains, transformations *can* produce chain entries with a different symbol via `derivedSingleDenotation(newSym, ...)` (Denotations.scala:696-698). The shipped `1f5992a645` is safe specifically because `Symbol.validFor_=` resets `Symbol.checkedPeriod`. The NamedType side has no such linkage, so the same widening shape is unsafe here. This is exactly what the verify document flagged as a "correctness nuance": The verify went on to argue the comment-asserted invariant ("installAfter never changes the *symbol*") still made it safe -- but that invariant only covers the SymDenotation arm. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, 63bd3c908e baseline → gate-only): 1. NamedType.symbol self%: 0.51 ± 0.09 → 0.44 ± 0.04 (-0.07, 0.5σ) 2. NamedType.symbol tot%: 1.02 ± 0.07 → 1.05 ± 0.02 (+0.03, 0.3σ, within noise) 3. NamedType.computeSymbol self%: 0.10 ± 0.01 → 0.11 ± 0.02 (+0.01, 0.3σ, within noise) 4. Select.symbol self%: 0.43 ± 0.04 → 0.39 ± 0.02 (-0.04, 0.7σ) 5. Symbols$.toDenot self%: 0.33 ± 0.05 → 0.31 ± 0.03 (-0.02, 0.3σ, within noise) 6. Symbol.denot self%: 1.18 ± 0.13 → 1.22 ± 0.09 (+0.04, 0.2σ, within noise) Estimated total speedup: 0.07 ± 0.00 (from 1 above) Profile note: the partial gate-only widen (line 2426 only, no writer change) runs cleanly but is a no-op for performance because `checkedPeriod` is still only set to `ctx.period`, so `contains` is equivalent to `==`. Sampled deltas at 5 runs x 10 runs (vs 63bd3c908e baseline, gate-only): All deltas within ~1σ. As expected: without the writer widening the gate widening alone cannot increase the hit window. The modified compiler builds, but immediately aborts on bootstrap compilation with an AssertionError. The same widening shape is unsafe here because NamedType.checkedPeriod has no invalidation hook analogous to Symbol.checkedPeriod.

Add the same owner-equality bypass that Context.withOwner already has, so runWithOwner can call op under the current context without touching the generalContextPool. The mechanism was correct, but profiling showed no significant movement: ContextPool.next was unchanged at 0.0σ and the largest affected MegaPhase row moved only 0.5σ, so the optimization does not pay for itself. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-9/run-0 → iter-9/run-1): 1. Context.freshOver self%: 0.95 ± 0.05 → 0.92 ± 0.08 (-0.03, 0.4σ) 2. Context.freshOver tot%: 0.98 ± 0.05 → 0.96 ± 0.08 (-0.02, 0.3σ) 3. ContextPool.next self%: 0.37 ± 0.03 → 0.37 ± 0.03 (+0.00, 0.0σ) 4. ContextPool.next tot%: 0.37 ± 0.03 → 0.37 ± 0.04 (+0.00, 0.0σ) 5. MegaPhase.loop$3 self%: 0.14 ± 0.02 → 0.16 ± 0.04 (+0.02, 0.5σ) 6. MegaPhase.loop$3 tot%: 7.46 ± 0.12 → 7.51 ± 0.26 (+0.05, 0.2σ) 7. MegaPhase.transformNamed$1 self%: 0.25 ± 0.08 → 0.22 ± 0.02 (-0.03, 0.4σ) 8. MegaPhase.transformNamed$1 tot%: 12.79 ± 0.22 → 12.64 ± 0.32 (-0.15, 0.5σ) 9. MegaPhase.transformTree self%: 0.66 ± 0.17 → 0.62 ± 0.17 (-0.04, 0.2σ) 10. MegaPhase.transformTree tot%: 12.91 ± 0.22 → 12.77 ± 0.32 (-0.14, 0.4σ) 11. MegaPhase.transformUnnamed$1 self%: 0.40 ± 0.04 → 0.39 ± 0.07 (-0.01, 0.1σ) 12. MegaPhase.transformUnnamed$1 tot%: 12.91 ± 0.22 → 12.77 ± 0.32 (-0.14, 0.4σ) Estimated total speedup: 0.03 ± 0.09 (from 1 above) No affected timing row clears the go/no-go threshold. FreshContext allocation was also effectively unchanged in the profile tree (144.21 KiB baseline → 143.51 KiB after), so this belongs on rejected.

Try using a cached valid SingleDenotation info in the TypeRef.dealias arm when the info is already completed, falling back to tp.info for LazyType and overloaded denotations. The mechanism compiled and profiled cleanly, but Type.dealias moved only 0.2σ self / 0.3σ total, so the extra cache probe does not pay for itself. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-9/run-3 → iter-9/run-7): 1. Denotation.info self%: 0.75 ± 0.42 → 0.99 ± 0.49 (+0.24, 0.5σ) 2. Denotation.info tot%: 17.98 ± 0.43 → 17.94 ± 0.37 (-0.04, 0.1σ) 3. Symbol.denot self%: 1.34 ± 0.10 → 1.28 ± 0.17 (-0.06, 0.4σ) 4. Symbol.denot tot%: 3.48 ± 0.16 → 3.32 ± 0.18 (-0.16, 0.9σ) 5. NamedType.denot self%: 0.26 ± 0.04 → 0.28 ± 0.04 (+0.02, 0.5σ) 6. NamedType.denot tot%: 3.86 ± 0.10 → 3.98 ± 0.16 (+0.12, 0.8σ) 7. Type.dealias self%: 1.45 ± 0.10 → 1.43 ± 0.09 (-0.02, 0.2σ) 8. Type.dealias tot%: 1.88 ± 0.13 → 1.84 ± 0.07 (-0.04, 0.3σ) Estimated total speedup: -0.18 ± 0.52 (from 1 3 above) No affected timing row clears the go/no-go threshold. Benchmark profiling completed with errors=0, but bootstrapped smoke was not run because this failed the performance threshold.

Add a no-binder hash helper for NamedTypeUniques mirroring the AppliedUniques shape, so prefix.hash is read directly instead of dispatching through typeHash(null, prefix). Hashable.typeHash drops by 3.1σ, but the cost reappears inside NamedTypeUniques.enterIfNew with a 2.8σ self regression and flat total time, so this is not a net win. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-9/run-1 → iter-9/run-8): 1. Hashable.doHashNoBinders self%: - → 0.21 ± 0.03 2. Hashable.doHashNoBinders tot%: - → 0.49 ± 0.06 3. Hashable.typeHash self%: 0.46 ± 0.02 → 0.21 ± 0.08 (-0.25, 3.1σ) 4. Hashable.typeHash tot%: 0.50 ± 0.02 → 0.24 ± 0.09 (-0.26, 2.9σ) 5. NamedTypeUniques.enterIfNew self%: 0.23 ± 0.09 → 0.48 ± 0.06 (+0.25, 2.8σ) 6. NamedTypeUniques.enterIfNew tot%: 2.46 ± 0.23 → 2.51 ± 0.20 (+0.05, 0.2σ) 7. NamedTypeUniques.linkedListLoop$1 self%: 1.38 ± 0.14 → 1.39 ± 0.21 (+0.01, 0.0σ) 8. NamedTypeUniques.linkedListLoop$1 tot%: 1.73 ± 0.19 → 1.72 ± 0.13 (-0.01, 0.1σ) Estimated total speedup: 0.00 ± 0.15 (from 3 5 above) The helper removes the separate Hashable.typeHash row but does not reduce the NamedTypeUniques total path, and it significantly raises enterIfNew self time. This is a reattribution/noise trap rather than a useful optimization.

Add a one-slot CachedType→Type front cache in baseTypeOf.recur before the existing btrCache lookup, avoiding sentinel values and clearing the slot on remove paths. The cache reduces recur total by 1.5σ, but baseTypeOf self regresses by 1.3σ and the EqHashMap.lookup improvement does not clear the strict threshold, so the shape is rejected. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-9/run-5 → iter-9/run-13): 1. ClassDenotation.baseTypeOf self%: 0.40 ± 0.05 → 0.72 ± 0.24 (+0.32, 1.3σ) 2. ClassDenotation.baseTypeOf tot%: 3.27 ± 0.32 → 3.13 ± 0.32 (-0.14, 0.4σ) 3. ClassDenotation.recur$4 self%: 0.79 ± 0.21 → 0.82 ± 0.23 (+0.03, 0.1σ) 4. ClassDenotation.recur$4 tot%: 2.86 ± 0.31 → 2.40 ± 0.14 (-0.46, 1.5σ) 5. EqHashMap.addOld self%: 0.70 ± 0.02 → 0.73 ± 0.05 (+0.03, 0.6σ) 6. EqHashMap.addOld tot%: 0.74 ± 0.03 → 0.76 ± 0.05 (+0.02, 0.4σ) 7. EqHashMap.lookup self%: 1.56 ± 0.28 → 1.29 ± 0.21 (-0.27, 1.0σ) 8. EqHashMap.lookup tot%: 1.61 ± 0.27 → 1.34 ± 0.21 (-0.27, 1.0σ) 9. EqHashMap.update self%: 0.58 ± 0.11 → 0.61 ± 0.02 (+0.03, 0.3σ) 10. EqHashMap.update tot%: 1.33 ± 0.11 → 1.39 ± 0.05 (+0.06, 0.5σ) Estimated total speedup: -0.08 ± 0.32 (from 1 3 7 above) The potential lookup saving is eaten by extra baseTypeOf self work. Bootstrapped smoke was not run because the profile failed the performance threshold.

Check `tp.prefix eq NoPrefix` before `tp.symbol.isStatic` in `TypeOps.simplify`, matching the accepted `AsSeenFromMap` ordering. This targets static-name simplification by avoiding a symbol lookup for `NoPrefix`, while preserving the existing symbol path for every other prefix shape. The direct simplify movement is small, `NamedType.symbol` stays flat, and the wider info-completion path moves the wrong way. The favorable `NamedType.denot` and `TypeMap.mapOver` totals do not line up with the targeted simplify/symbol path. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-13/run-0 → iter-13/run-1): 1. TypeOps$.simplify self%: 0.20 ± 0.04 → 0.20 ± 0.02 (+0.00, 0.0σ) 2. TypeOps$.simplify tot%: 0.86 ± 0.06 → 0.83 ± 0.03 (-0.03, 0.5σ) 3. NamedType.symbol tot%: 1.03 ± 0.08 → 1.08 ± 0.15 (+0.05, 0.3σ) 4. NamedType.denot tot%: 4.34 ± 0.06 → 3.58 ± 0.10 (-0.76, 7.6σ) 5. Denotation.info tot%: 17.51 ± 0.34 → 18.63 ± 0.93 (+1.12, 1.2σ) 6. SymDenotation.completeFrom tot%: 16.55 ± 0.33 → 18.04 ± 1.16 (+1.49, 1.3σ) 7. Typer.typed tot%: 69.95 ± 0.14 → 71.15 ± 1.02 (+1.20, 1.2σ) Estimated total speedup: -0.22 ± 0.79 (from 1 5 6 7 above) Rejected. The intended simplify win remains below noise, while the info-completion and Typer totals regress enough to make the change a poor tradeoff.

Implicit member collection now carries lazy implicit refs through candidate filtering and materializes TermRefs only for survivors. This was meant to reduce NamedTypeUniques work for candidates rejected by accessibility, kind, or compatibility checks. Safety depends on survivors still producing the same TermRef(prefix, symbol) with denotations read in the current context, but the profile shows the saved uniquing is noise while candidate filtering and broad typer totals regress. Expected changes: - NamedTypeUniques.linkedListLoop$1 self% and tot% should improve: skipping eager TermRef creation should reduce named-type uniquing for rejected implicit candidates. - ImplicitRefs.candidateKind$1 self% and tot% could regress: filtering reads lazy denotation info before deciding kind and compatibility. - Typer.typed tot% could regress: extra lazy-ref work can surface in the enclosing implicit-search typer path when uniquing savings do not pay. - Typer.typedNamed$1 tot% could regress: typed named trees inherit the same implicit-search overhead. - No other regressions expected: accepted candidates still materialize the same TermRef(prefix, symbol), while rejected candidates stay on the existing filtering decisions. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-19/run-0 → iter-19/run-72): 1. NamedTypeUniques.linkedListLoop$1 self%: 1.19 ± 0.19 → 1.05 ± 0.10 (-0.14, 0.7σ) 2. NamedTypeUniques.linkedListLoop$1 tot%: 1.51 ± 0.06 → 1.39 ± 0.13 (-0.12, 0.9σ) 3. ImplicitRefs.candidateKind$1 self%: below floor → 0.12 ± 0.03 4. ImplicitRefs.candidateKind$1 tot%: below floor → 6.08 ± 0.25 5. Typer.typed tot%: 67.92 ± 0.12 → 69.40 ± 0.19 (+1.48, 7.8σ) 6. Typer.typedNamed$1 tot%: 58.76 ± 0.25 → 60.44 ± 0.40 (+1.68, 4.2σ) Estimated total speedup: -1.48 ± 0.22 (from row 5 above; row 6 is overlapping typer confirmation) Rejected. NamedTypeUniques.linkedListLoop$1 improves by only 0.7σ self and 0.9σ total, while ImplicitRefs.candidateKind$1 appears at 6.08% total and Typer.typed regresses +1.48 at 7.8σ. Typer.typedNamed$1 also regresses +1.68 at 4.2σ, so the lazy refs move cost into candidate filtering and typer rather than producing a reliable win.

Names.termName now fuses ASCII validation and hash computation for UTF8 byte slices before entering NameTable.enterIfNewAscii. This removes the separate hashValueAscii/isAscii scan when the bytes are all ASCII, but occupied table slots still require equalsAscii confirmation and misses still copy bytes into the shared character array. The hash recurrence, synchronized insertion, stored characters, and non-ASCII Codec.fromUTF8 fallback remain unchanged, so this only changes how the ASCII fast path reaches the existing name-table probe. Expected changes: - Names.hashValueAscii self% should improve: termName computes the ASCII hash during validation instead of calling the separate helper. - Names.equalsAscii self% and tot% could regress: occupied table slots still require byte-to-char equality confirmation, so fused hashing cannot remove that remaining probe cost. - Names.termName self% could regress: the byte overload carries a longer branch body and hash loop before delegating to the name table. - TastyUnpickler.readNameContents tot% should improve: if the saved byte scan matters, the TASTy name reader should inherit less Names.termName work. - No semantic regressions expected: ASCII bytes use the same hash recurrence as character names, non-ASCII bytes still decode through Codec.fromUTF8, and interned names still compare against the same shared character table. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-19/run-0 → iter-19/run-73): 1. Names.equalsAscii self%: 0.11 ± 0.02 → 0.17 ± 0.04 (+0.06, 1.5σ) 2. Names.equalsAscii tot%: 0.11 ± 0.02 → 0.17 ± 0.04 (+0.06, 1.5σ) 3. Names.termName self%: below floor → 0.12 ± 0.02 4. Names.termName tot%: below floor → 0.67 ± 0.05 5. TastyUnpickler.readNameContents self%: below floor → 0.09 ± 0.02 6. TastyUnpickler.readNameContents tot%: below floor → 0.81 ± 0.11 Estimated total speedup: -0.27 ± 0.05 (from rows 1, 3, and 5 above; direct self% rows are exclusive, with newly visible rows counted as regressions from below floor) Rejected. The intended hashValueAscii/isAscii savings stay below the summary floor, while equalsAscii is the significant visible movement and regresses by +0.06 at 1.5σ. Names.termName and TastyUnpickler.readNameContents newly enter the summary, so fused validation shifts cost into the remaining byte-name probe path instead of producing a reliable compile-time win.

OrderingConstraint.add marks only the canonical independent-bounds TypeLambda snapshot that adds default bounds without parameter ordering, and addToConstraint skips the post-add propagation loop when that same snapshot is observed. The marker is cleared by later constraint updates, so non-default bounds, dependent bounds, wildcard and type-variable cases, and any add that creates ordering edges still use the existing propagation path. Expected changes: - OrderingConstraint.entry self% and tot% should improve: empty independent TypeLambdas no longer probe entry, lower, and upper for each new parameter. - TypeComparer.addToConstraint tot% should improve: propagation-free adds return after assigning the new constraint. - ConstraintHandling.addToConstraint self% could regress: every add pays the marker check before the old propagation loop. - No correctness regressions expected: only the existing default independent-bounds path can set the marker, and later constraint updates clear it. JFR profile deltas (iter-23/run-0 → iter-23/run-3): 1. OrderingConstraint.entry self%: 0.27 ± 0.06 → 0.25 ± 0.05 (-0.02, 0.3σ) 2. OrderingConstraint.entry tot%: 0.33 ± 0.06 → 0.30 ± 0.04 (-0.03, 0.5σ) 3. TypeComparer.addToConstraint tot%: below floor → below floor 4. TypeComparer.recur self%: 0.45 ± 0.05 → 0.35 ± 0.01 (-0.10, 2.0σ) 5. TypeComparer.secondTry$1 self%: 0.36 ± 0.04 → 0.27 ± 0.05 (-0.09, 1.8σ) 6. ClassDenotation.membersNamed self%: 0.35 ± 0.23 → 0.65 ± 0.16 (+0.30, 1.3σ) Estimated total speedup/noise result: noise; +0.02 ± 0.08 from row 1, with the direct OrderingConstraint.entry rows below 1σ and TypeComparer.addToConstraint below the summary floor. Rejected. OrderingConstraint.entry is the direct affected summary row, and both self% and total time remain below the 1σ go/no-go threshold. The broader TypeComparer.recur and secondTry$1 improvements do not expose addToConstraint itself, while ClassDenotation.membersNamed shows an unrelated self-time regression, so this marker check does not produce a reliable targeted win.

TreeMapWithImplicits delays nested context creation for block stats, method parameter scopes, and case pattern scopes until the scan reaches a given or implicit DefTree. The intended mechanism is to avoid empty Context.freshOver and ContextPool.next work; it is safe because an empty child scope contributes no contextual bindings, and non-empty scans still allocate the nested scope at the first binding while preserving scan and enter order. Expected changes: - Context.freshOver self% and tot% should improve: empty scans avoid fresh child contexts. - ContextPool.next self% and tot% should improve: fewer nested scopes are drawn from the context pool. - Context.lookup self% and tot% should stay neutral or could regress slightly: empty scans return the parent context directly. - No semantic regressions expected: contextual DefTrees still allocate the nested scope at the first binding, and empty scans expose the same outer contextual bindings. JFR profile deltas (iter-23/run-0 → iter-23/run-4): 1. Context.freshOver self%: 1.04 ± 0.04 → 1.01 ± 0.09 (-0.03, 0.3σ) 2. Context.freshOver tot%: 1.04 ± 0.04 → 1.01 ± 0.09 (-0.03, 0.3σ) 3. ContextPool.next self%: 0.40 ± 0.06 → 0.36 ± 0.07 (-0.04, 0.6σ) 4. ContextPool.next tot%: 0.41 ± 0.06 → 0.36 ± 0.07 (-0.05, 0.7σ) 5. Context.lookup self%: 0.19 ± 0.05 → 0.19 ± 0.04 (+0.00, 0.0σ) 6. Context.lookup tot%: 0.21 ± 0.05 → 0.20 ± 0.05 (-0.01, 0.2σ) Estimated total speedup/noise result: +0.07 ± 0.13 (from rows 1 and 3; direct context-allocation self% rows are exclusive). Rejected. Context.freshOver moves by only 0.3σ, and ContextPool.next reaches only 0.6σ self and 0.7σ total, so the apparent allocation-path reduction is inside run noise. Context.lookup stays neutral, while TreeMapWithImplicits.transform and helper rows remain below the profile summary floor, leaving no significant timing row to justify accepting the lazy scope allocation.

ClassDenotation.thisType already caches one ThisType per class denotation, so this version constructs that cached ThisType directly and seeds its class cache instead of sending it through ThisType.raw and the generic weak unique table. The targeted path was hot through WeakHashSet.linkedListLoop$4 at 0.69% total in iter-23/run-0, and the allocation tree confirms the ThisType.raw -> ClassDenotation.computeThisType branch disappeared after the change. External ThisType.raw remains canonical, and semantic ThisType checks still use tref equality or sameThis instead of relying on eq identity, but direct weak-set timing stayed flat while NamedTypeUniques.linkedListLoop$1 regressed from 1.55% to 1.81% total. Expected changes: - WeakHashSet.linkedListLoop$4 self% should improve: ClassDenotation.computeThisType no longer calls ThisType.raw -> Uniques.unique -> WeakHashSet.put. - NamedTypeUniques.linkedListLoop$1 tot% could regress: thisType construction still builds TypeRef prefixes, so any shifted ref construction or colder cache behavior can show up in named ref uniquing. - TypeComparer.isSubType tot% could regress: class-owned ThisTypes are no longer necessarily eq to externally raw-canonical ThisTypes, so identity-sensitive paths may need semantic sameThis/equality fallback. - No other regressions expected: external ThisType.raw canonicalization, class-local thisType caching, and semantic equality checks keep the observable ThisType behavior unchanged. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-11): 1. WeakHashSet.linkedListLoop$4 self%: 0.36 ± 0.04 → 0.36 ± 0.11 (+0.00, 0.0σ) 2. WeakHashSet.linkedListLoop$4 tot%: 0.69 ± 0.12 → 0.77 ± 0.09 (+0.08, 0.7σ) 3. NamedTypeUniques.linkedListLoop$1 tot%: 1.55 ± 0.13 → 1.81 ± 0.11 (+0.26, 2.0σ) 4. TypeComparer.isSubType tot%: 15.06 ± 0.28 → 14.61 ± 0.33 (-0.45, 1.4σ) 5. TypeComparer.recur self%: 0.45 ± 0.05 → 0.36 ± 0.04 (-0.09, 1.8σ) Estimated total speedup: -0.26 ± 0.21 (from rows 1 and 3 above; row 5 is broad and not charged to the mechanism) Rejected. ThisType.raw allocation under ClassDenotation.computeThisType disappeared from the tree, but WeakHashSet timing stayed within noise and NamedTypeUniques.linkedListLoop$1 regressed significantly. The broader TypeComparer improvements are not mechanism-supported enough to accept over the direct regression.

TokensCommon now builds a compact open-addressed keyword table keyed by SimpleName.start + 1 instead of allocating and filling a dense int array up to the largest global name-table offset. This targets Arrays.fill at 0.42% self and total in iter-23/run-0 and the TokensCommon.buildKeywordArray → Array.fill int[] allocation branch at 620.80 KiB by replacing the dense fill with storage proportional to the keyword set, while Scanner and JavaScanner still use bounded lookup. It is safe because zero remains the empty sentinel, missing or negative starts still return IDENTIFIER, and migration handling still wraps the keyword result, but the retained Arrays.fill row stayed flat. Expected changes: - Arrays.fill self% and tot% should improve: buildKeywordTable no longer fills the dense keyword array created by buildKeywordArray. - int[] allocation share% should improve: keyword-table storage scales with source keyword count instead of the largest global SimpleName.start value. - ScannerCommon.finishNamedToken tot% could regress: Scala identifier keyword checks now hash and probe instead of doing a bounds check and dense array load. - JavaScanner.toToken tot% could regress: Java keyword checks use the same sparse lookup path. - No semantic regressions expected: keyword identity is still keyed by SimpleName.start, absent starts return IDENTIFIER, and migration handling still wraps the keyword result. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-14): 1. Arrays.fill self%: 0.42 ± 0.05 → 0.40 ± 0.10 (-0.02, 0.2σ) 2. Arrays.fill tot%: 0.42 ± 0.05 → 0.40 ± 0.10 (-0.02, 0.2σ) 3. int[] allocation share%: 6.00 ± 0.32 → 5.50 ± 0.60 (-0.50, 0.8σ) Estimated total speedup: +0.02 ± 0.11 (from row 1 above; the direct timing row is below 1σ, and the allocation share movement is inside noise) Rejected. The dense keyword-table allocation disappeared from the allocation tree, but Arrays.fill stayed flat at 0.2σ and aggregate int[] allocation share moved only 0.8σ. The change is not a measurable compiler-speed improvement on iter-23/run-14.

TreeBuffer now stores tree addresses in a local identity map with dense address values and an open-addressed entry-index table instead of util.IntMap. The hot path is TASTy pickling address registration: iter-23/run-0 showed the TreeBuffer/IntMap branch contributing sampled int[] allocation, and this implementation reduced aggregate int[] allocation from 6.00% to 3.66% / 4.98 MiB to 3.05 MiB while leaving TreeBuffer, TreeAddrMap, IntMap, PerfectHashing, and TastyBuffer timing rows below the summary floor. It is safe because Tree keys are already reference-compared, keys stay identity-stable during pickling, and compactify still rewrites the dense stored addresses before positions and comments query addrOfTree. Expected changes: - TreeBuffer.registerTreeAddr self% should improve: new address registrations avoid util.IntMap lookup/update and PerfectHashing-backed table growth. - int[] allocation share% should improve: TreeAddrMap uses one int table plus dense values instead of IntMap values plus the larger PerfectHashing table. - TreeAddrMap.putIfAbsent self% could regress: the specialized map hashes and probes explicitly, so collisions or earlier table growth can add lookup work. - TreeBuffer.compactify.adjustTreeAddrs self% could regress: the dense value array is still scanned, but address adjustment now reads and writes through TreeAddrMap helpers. - No semantic regressions expected: Tree keys are compared by eq as before, address values remain dense and mutable, and compactification still rewrites every registered address before positions and comments query addrOfTree. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-15): 1. int[] allocation share%: 6.00 ± 0.32 → 3.66 ± 0.55 (-2.34, 4.3σ) 2. int[] allocation bytes MiB: 4.98 ± 0.12 → 3.05 ± 0.46 (-1.93, 4.2σ) 3. Total allocation bytes MiB: 83.06 ± 2.38 → 83.39 ± 1.61 (+0.33, 0.1σ) 4. Typer.typedNamed$1 tot%: 59.03 ± 0.29 → 60.83 ± 0.66 (+1.80, 2.7σ) 5. DirectMethodHandle.allocateInstance self%: 0.45 ± 0.06 → 0.57 ± 0.04 (+0.12, 2.0σ) Estimated total speedup: 0.00 ± 0.00 (direct TreeBuffer timing rows stayed below the profile summary floor; rows 4-5 are broad regression checks) Rejected. The specialized map removes the targeted IntMap int-array allocation and reduces aggregate int[] allocation significantly, but no direct TreeBuffer, TreeAddrMap, IntMap, PerfectHashing, or TastyBuffer timing row becomes measurable and total allocation remains inside noise. The broad Typer.typedNamed$1 and DirectMethodHandle.allocateInstance regressions are not mechanism-supported, but without a targeted timing or total-allocation win this is not a reliable compiler-speed improvement.

AnnotatedType now owns dealias(keeps), and ordinary no-keep calls outside capture-checking phases forward directly to the parent dealias result instead of reaching the generic Type.dealias annotated case and CapturingType extractor. The path targets Types$Type.dealias at 1.14% self and 1.56% total in iter-23/run-0, where recursive annotated wrappers contribute to the default alias-following loop. This is safe because keep-annotation variants still rewrap through derivedAnnotatedType, capture-checking/setup phases still use CapturingType, and non-kept non-capture annotations were already dropped after parent dealiasing. Expected changes: - Types$Type.dealias self% and tot% should improve: AnnotatedType wrappers skip the generic match arm and the capture extractor when annotations cannot affect alias following. - Types$Type.widenDealias tot% should improve: default widen-dealias callers inherit the cheaper annotated-wrapper path. - Types$NamedType.symbol self% could regress: removing annotated-wrapper overhead can expose the TypeRef symbol checks in the remaining dealias work. - No semantic regressions expected: annotation-preserving modes still rewrap, refining annotations still ask isRefining, and capture phases keep the existing CapturingType reconstruction. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-22): 1. Types$Type.dealias self%: 1.14 ± 0.11 → 1.04 ± 0.11 (-0.10, 0.9σ) 2. Types$Type.dealias tot%: 1.56 ± 0.10 → 1.45 ± 0.11 (-0.11, 1.0σ) 3. Types$NamedType.symbol self%: 0.32 ± 0.04 → 0.40 ± 0.06 (+0.08, 1.3σ) 4. Types$Type.widenDealias tot%: 0.36 ± 0.07 → 0.33 ± 0.07 (-0.03, 0.4σ) Estimated total speedup: +0.02 ± 0.17 (from rows 1 and 3 above; self% rows are exclusive, with row 3 netted as the measured regression) Rejected. Types$Type.dealias self-time and total-time movement stays at or below the 1σ go/no-go threshold, while Types$NamedType.symbol shows a larger visible self-time regression. The inherited widenDealias movement is also inside noise, so the annotated-wrapper fast path does not produce a measurable compiler-speed win.

Trait dropping now skips the full TyperState snapshot before trial subtype checks when both the inferred type and bound are non-provisional ground types, falling back to the previous snapshot/reset path for provisional types, TypeParamRefs, and TypeVars. The path sits under ConstraintHandling.dropTransparentTraits during inferred-type widening and feeds TypeComparer.isSubType, which was 15.06% total in iter-23/run-0, while failed transparent-trait drops only need rollback if subtype checks can mutate constraints or owned type variables. The guard preserves the old behavior for mutable inference shapes, but the edited path stays below the profile floor and the extra stability probes do not produce a direct timing win. Expected changes: - TypeComparer.isSubType self% should improve: stable transparent-trait trials avoid full state snapshot setup before entering subtype comparison. - TypeComparer.recur self% should improve: successful stable trial comparisons run with less rollback bookkeeping in the caller. - Types$Type.widenIfUnstable self% could regress: the added stable-shape probes can expose or add widening-side type inspection work. - Typer.typedNamed$1 tot% could regress: broad typer time can move if the extra widening guard is paid on common typed-name paths. - No semantic regressions expected: any provisional, TypeParamRef, or TypeVar-containing type keeps the old TyperState snapshot and reset path, and the stable path still restores the current Constraint reference on fallback. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-24): 1. TypeComparer.isSubType self%: 0.54 ± 0.09 → 0.42 ± 0.09 (-0.12, 1.3σ) 2. TypeComparer.recur self%: 0.45 ± 0.05 → 0.37 ± 0.04 (-0.08, 1.6σ) 3. TypeComparer.secondTry$1 self%: 0.36 ± 0.04 → 0.25 ± 0.02 (-0.11, 2.7σ) 4. Types$Type.widenIfUnstable self%: 0.14 ± 0.04 → 0.26 ± 0.04 (+0.12, 3.0σ) 5. Typer.typedNamed$1 tot%: 59.03 ± 0.29 → 61.28 ± 0.52 (+2.25, 4.3σ) Estimated total speedup: +0.19 ± 0.16 (from rows 1-4 above; row 5 is a broad overlapping caller regression) Rejected. The subtype self-time rows move down significantly, but the edited dropTransparentTraits path stays below the profile floor. The significant Types$Type.widenIfUnstable self-time and Typer.typedNamed$1 total-time regressions leave no reliable compiler-speed improvement to accept.

applyIfParameterized now recognizes shapes that cannot expose type parameters without calling typeParams: method/poly, singleton, refined, recursive, and/or types return self directly, while HK lambdas and current-run class refs check their stored parameter lists before applying. The path is reached from AppliedType supertype/lower-bound and TypeComparer applied-type bound comparisons; iter-23/run-0 shows TypeApplications.typeParams at 0.26% self / 0.40% total and TypeComparer.recur at 14.75% total. The guards preserve the existing fallback for non-class type refs, non-class applied tycons, lazy/proxy/bounds shapes, so any case that may need typeParams recursion still uses the old query. Expected changes: - TypeApplications.typeParams self% should improve: known nil shapes and current class/HK cases no longer call the full typeParams extension before applyIfParameterized returns or applies. - TypeComparer.recur self% should improve: applied-type bound comparisons inherit the cheaper applyIfParameterized guard on subtype recursion paths. - Types$NamedType.symbol self% could regress: class-ref guards may shift symbol/designator probing around TypeRef handling. - No semantic regressions expected: non-current class refs, non-class TypeRefs, non-class AppliedTypes, wildcard/bounds/proxy shapes still fall back to the existing typeParams logic. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-27): 1. TypeApplications.typeParams self%: 0.26 ± 0.04 → 0.23 ± 0.04 (-0.03, 0.8σ) 2. TypeApplications.typeParams tot%: 0.40 ± 0.04 → 0.39 ± 0.08 (-0.01, 0.1σ) 3. TypeComparer.recur self%: 0.45 ± 0.05 → 0.39 ± 0.04 (-0.06, 1.2σ) 4. TypeComparer.recur tot%: 14.75 ± 0.30 → 14.33 ± 0.51 (-0.42, 0.8σ) 5. Types$NamedType.symbol self%: 0.32 ± 0.04 → 0.34 ± 0.05 (+0.02, 0.4σ) Estimated total speedup: +0.03 ± 0.06 (from row 1 above) Rejected. The direct TypeApplications.typeParams movement remains below the 1σ threshold, and the TypeComparer.recur self-time improvement is only a broad caller row without a significant total-time confirmation. The symbol-probe check no longer regresses meaningfully, but this fast path does not produce a reliable compiler-speed win.

TokensCommon now builds a compact open-addressed keyword table from one interned SimpleName.start probe per source keyword instead of filling a dense array up to the largest global name-table offset, and raw unary-name checks use direct name comparisons instead of a Set lookup. The hot scanner path still interns identifiers before keyword classification, while iter-23/run-0 only showed the related Arrays.fill row at 0.42% self/total and Names.equalsAscii at 0.15% self. This is safe because keyword identity remains keyed by SimpleName.start, missing starts still return IDENTIFIER, migration handling still wraps the keyword result, and raw unary operators keep the same four-name identity set. Expected changes: - Arrays.fill self% and tot% should improve: keyword-table construction no longer fills a dense array sized by the largest global SimpleName.start. - Names.equalsAscii self% could regress: unchanged scanner interning still probes the global name table before keyword classification, so normalized ASCII equality attribution can move. - Names.equals self% could regress: raw unary-name direct comparisons and the remaining name-table probes can shift normalized Name equality attribution. - Names.isAscii self% could regress: the edited keyword table does not remove ASCII detection before NameTable lookup, and reduced startup fill work can expose that path. - No semantic regressions expected: absent keyword starts return IDENTIFIER, Java and Scala scanners use the same start-keyed table, and raw unary checks preserve identity-based membership. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-33): 1. Arrays.fill self%: 0.42 ± 0.05 → 0.37 ± 0.08 (-0.05, 0.6σ) 2. Arrays.fill tot%: 0.42 ± 0.05 → 0.37 ± 0.08 (-0.05, 0.6σ) 3. Names.equalsAscii self%: 0.15 ± 0.03 → 0.17 ± 0.03 (+0.02, 0.7σ) 4. Names.equals self%: 0.12 ± 0.03 → 0.13 ± 0.03 (+0.01, 0.3σ) 5. Names.isAscii self%: below floor → 0.09 ± 0.02 Estimated total speedup: -0.07 ± 0.11 (from rows 1 and 3-5 above; row 5 is charged as a full below-floor regression) Rejected. The direct Arrays.fill timing movement remains below the 1σ threshold, while the reported Names.equalsAscii, Names.equals, and Names.isAscii rows move against the change in aggregate. The keyword/raw lookup mechanics compile and preserve scanner behavior, but they do not produce a reliable compiler-speed improvement on this profile.

Inliner now stores the inline method owner and call-prefix type once per expansion, and materializes the owner thisType lazily only when an impure prefix has to be registered. The intended hot path is the inline body type scan that reaches canElideThis and adaptToPrefix before building this and type-parameter proxies; iter-23/run-0 shows TypeMap.mapOver at 0.70% self / 7.69% total and TreeTypeMap.transform at 7.48% total around this mapping work. This is safe because the method owner and call prefix are fixed for an Inliner instance, the owner thisType remains demand-driven, and the existing containment, package, static-owner, and opaque checks are unchanged. Expected changes: - TypeMap.mapOver self% should improve: repeated inline prefix type reads and method-owner reads are replaced by cached per-Inliner fields before type proxy adaptation. - TreeTypeMap.transform tot% should improve: inline body transformation inherits the cheaper Inliner type-scan setup around mapped types. - ClassDenotation.membersNamed self% could regress: unchanged member lookup attribution can take a larger normalized share if mapper work gets cheaper, but its total time should stay flat. - No semantic regressions expected: the cached values are immutable for the lifetime of the Inliner, and the lazy owner thisType preserves the old demand point for impure-prefix registration. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-34): 1. TypeMap.mapOver self%: 0.70 ± 0.07 → 0.61 ± 0.06 (-0.09, 1.3σ) 2. TypeMap.mapOver tot%: 7.69 ± 0.28 → 7.50 ± 0.25 (-0.19, 0.7σ) 3. TreeTypeMap.transform tot%: 7.48 ± 0.18 → 7.15 ± 0.26 (-0.33, 1.3σ) 4. ClassDenotation.membersNamed self%: 0.35 ± 0.23 → 0.69 ± 0.33 (+0.34, 1.0σ) 5. ClassDenotation.membersNamed tot%: 6.04 ± 0.15 → 6.11 ± 0.23 (+0.07, 0.3σ) Estimated total speedup: +0.09 ± 0.09 (from row 1 above; rows 2-3 are overlapping caller confirmations, and rows 4-5 watch unchanged member-lookup attribution rather than the direct mechanism) Rejected. TypeMap.mapOver self-time clears the threshold, but the edited Inliner methods stay below the profile summary floor and TypeMap.mapOver total time remains inside noise. The TreeTypeMap.transform total movement is only a broad caller row, so this cache does not produce a reliable compiler-speed win.

Inliner now stores the inline method owner and call-prefix type once per expansion, materializing the owner thisType lazily only when an impure prefix has to be registered. The hot path is the inline body type scan that reaches canElideThis and adaptToPrefix before building this and type-parameter proxies; iter-23/run-0 shows TypeMap.mapOver at 0.70% self / 7.69% total and TreeTypeMap.transform at 7.48% total around this mapping work. This is safe because the method owner and call prefix are fixed for an Inliner instance, the owner thisType remains demand-driven, and the existing containment, package, static-owner, and opaque checks are unchanged. Expected changes: - TypeMap.mapOver self% should improve: repeated inline prefix type reads, method-owner reads, and class extraction in canElideThis are replaced by cached per-Inliner or per-call local values before type proxy adaptation. - TreeTypeMap.transform tot% should improve: inline body transformation inherits the cheaper Inliner type-scan setup around mapped types. - ClassDenotation.membersNamed self% could regress: unchanged member lookup attribution can take a larger normalized share if mapper work gets cheaper, but its total time should stay flat. - ClassDenotation.membersNamedNoShadowingBasedOnFlags tot% could regress: unrelated filtered member lookup can move with normalized profile share, so this row checks for broad lookup noise. - No semantic regressions expected: the cached values are immutable for the lifetime of the Inliner, and the lazy owner thisType preserves the old demand point for impure-prefix registration. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-37): 1. TypeMap.mapOver self%: 0.70 ± 0.07 → 0.57 ± 0.06 (-0.13, 1.9σ) 2. TypeMap.mapOver tot%: 7.69 ± 0.28 → 7.22 ± 0.32 (-0.47, 1.5σ) 3. TreeTypeMap.transform tot%: 7.48 ± 0.18 → 6.69 ± 0.26 (-0.79, 3.0σ) 4. ClassDenotation.membersNamed self%: 0.35 ± 0.23 → 0.65 ± 0.21 (+0.30, 1.3σ) 5. ClassDenotation.membersNamedNoShadowingBasedOnFlags tot%: 3.72 ± 0.09 → 3.91 ± 0.16 (+0.19, 1.2σ) Estimated total speedup: +0.13 ± 0.09 (from row 1 above; rows 2-3 are overlapping caller confirmations, and rows 4-5 watch unchanged member-lookup attribution rather than the direct mechanism) Rejected. TypeMap.mapOver self-time and total-time clear the threshold, and TreeTypeMap.transform moves as a broad caller confirmation. The edited Inliner methods stay below the profile summary floor, while unrelated member-lookup rows move against the change, so this cache does not establish a reliable compiler-speed improvement.

Inlineable argument lookup now records generated inline parameter proxy bindings by their pooled TermRef, then refreshes only existing entries after binding normalization. The intended hot path is macro and inline argument retyping, where the old extractor built paramProxy.values.toSet and scanned bindingsBuf by name; the new map keeps the old name guard while avoiding that repeated set/list work. It is safe because only generated Inline parameter bindings enter the pool, normalized bindings replace prior entries, and non-proxy or idempotent arguments still fall through exactly as before. Expected changes: - BoxesRunTime.equals2 self% and tot% should improve: pooled proxy lookup should avoid some structural Type equality from the old paramProxy.values.toSet membership path. - HashSet.isEqual self% and tot% should improve: the inline-argument extractor should allocate and query fewer temporary Set entries. - TypeMap.mapOver self% and tot% may improve as caller confirmation: cheaper inline-argument retyping can lower nearby inliner mapping attribution. - Inliner.registerType self% could regress: the extra per-inliner proxy pool and refresh checks can move small setup costs into visible inliner attribution. - No semantic regressions expected: lookup still requires the Ident name to match the pooled binding name, and only already-pooled parameter proxy refs are refreshed after normalization. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-39): 1. BoxesRunTime.equals2 self%: 0.78 ± 0.04 → 0.70 ± 0.11 (-0.08, 0.7σ) 2. BoxesRunTime.equals2 tot%: 0.89 ± 0.03 → 0.82 ± 0.10 (-0.07, 0.7σ) 3. HashSet.isEqual self%: 0.23 ± 0.04 → 0.22 ± 0.04 (-0.01, 0.3σ) 4. HashSet.isEqual tot%: 0.30 ± 0.04 → 0.29 ± 0.06 (-0.01, 0.2σ) 5. TypeMap.mapOver self%: 0.70 ± 0.07 → 0.62 ± 0.07 (-0.08, 1.1σ) 6. TypeMap.mapOver tot%: 7.69 ± 0.28 → 7.23 ± 0.16 (-0.46, 1.6σ) 7. TreeTypeMap.transform tot%: 7.48 ± 0.18 → 7.11 ± 0.15 (-0.37, 2.1σ) 8. Inliner.registerType self%: below floor → 0.10 ± 0.04 Estimated total speedup: -0.02 ± 0.12 (from rows 1 and 8 above; row 8 is charged as a full below-floor regression) Rejected. The intended direct equality and set rows stay within noise, with BoxesRunTime.equals2 only 0.7σ and HashSet.isEqual no better than 0.3σ. TypeMap.mapOver and TreeTypeMap.transform move down as broad caller rows, but they do not prove the inline-argument binding pool paid for itself, and the new visible Inliner.registerType row offsets the small direct self-time movement.

ClassDenotation.membersNamedNoShadowingBasedOnFlags now computes post-typer filtered direct declarations before inherited lookup instead of first materializing and filtering the full membersNamed result. The iter-23/run-0 profile had membersNamedNoShadowingBasedOnFlags at 3.72% total, ClassDenotation.membersNamed at 6.04% total, and MutableScope.lookupEntry at 0.76% self in the filtered member path, so the intended win is to avoid an unfiltered cache lookup and extra inherited pass when private or otherwise excluded members should not shadow. The guard is safe because the direct path is disabled before typer and for package/package-class denotations, where Invisible handling and package-object member overrides need the old full-member computation. Expected changes: - ClassDenotation.membersNamedNoShadowingBasedOnFlags tot% should improve: filtered lookup should avoid computing the full unfiltered member set before rebuilding inherited members after exclusions. - MutableScope.lookupEntry self% should improve: post-typer filtered lookup should perform fewer unfiltered declaration-scope probes across the first full membersNamed pass. - EqHashMap.lookup self% should improve: fewer declaration-scope probes should reduce hash-map lookup work below MutableScope lookup. - ClassDenotation.membersNamed tot% could regress: filtered lookups no longer populate or reuse the unfiltered membersNamed cache on the optimized path. - ClassDenotation.addInherited tot% could regress: rebuilding inherited members from the filtered direct path can move work from the old cached membersNamed caller into inherited assembly. - No semantic regressions expected: pre-typer lookups and package-class lookups keep the old full-member path, while the direct path applies the same filterWithFlags checks before the existing addInherited combiner. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-44): 1. ClassDenotation.membersNamedNoShadowingBasedOnFlags tot%: 3.72 ± 0.09 → 4.10 ± 0.17 (+0.38, 2.2σ) 2. MutableScope.lookupEntry self%: 0.76 ± 0.13 → 0.94 ± 0.04 (+0.18, 1.4σ) 3. EqHashMap.lookup self%: 0.51 ± 0.17 → 0.70 ± 0.16 (+0.19, 1.1σ) 4. ClassDenotation.membersNamed tot%: 6.04 ± 0.15 → 5.60 ± 0.20 (-0.44, 2.2σ) 5. ClassDenotation.ownDenotsNamedWithFlags self%: below floor → 0.15 ± 0.06 6. ClassDenotation.addInherited tot%: 3.82 ± 0.13 → 4.12 ± 0.12 (+0.30, 2.3σ) Estimated total speedup: -0.52 ± 0.28 (from rows 2, 3, and 5 above; rows 1, 4, and 6 are overlapping caller/callee confirmations) Rejected. The new direct helper appears as fresh self-time, and the intended membersNamedNoShadowingBasedOnFlags, MutableScope.lookupEntry, and EqHashMap.lookup rows all regress beyond the threshold. ClassDenotation.membersNamed total falls because the filtered path stops doing some unfiltered cache work, but the replacement work is more expensive in the direct filtered and inherited lookup rows.

CoreBTypes.typeToTypeKind now checks symbolic primitive TypeRefs by reference before forcing t.info, and primitiveOrClassToBType uses the same direct primitive table instead of primitiveTypeMap.getOrElse. The targeted iter-23/run-0 row was the synthetic CoreBTypes.primitiveOrClassToBType anonfun at 0.11% self / 1.00% total, but the path sits next to type-reference symbol recovery and the final profile moves visible self-time into NamedType.symbol. This is safe because Scala primitive class symbols are canonical definitions, and every nonprimitive or nonsymbolic TypeRef still follows the old info-forcing and class/nonclass handling. Expected changes: - CoreBTypes.primitiveOrClassToBType$1$$anonfun$1 self% and tot% should improve: direct primitive matching removes the immutable-map getOrElse default lambda around class BType recovery. - Types$NamedType.denot tot% should improve: primitive TypeRefs can avoid the old t.info denotation path before returning the primitive BType. - Types$NamedType.symbol self% could regress: symbolic TypeRefs now read the designator and run direct primitive checks before falling back, and unchanged symbol recovery can take a larger share. - No semantic regressions expected: only canonical primitive class symbols bypass t.info, and every nonprimitive fallback still forces info before testing class symbols. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-46): 1. CoreBTypes.primitiveOrClassToBType$1$$anonfun$1 self%: 0.11 ± 0.12 → below floor 2. CoreBTypes.primitiveOrClassToBType$1$$anonfun$1 tot%: 1.00 ± 0.09 → below floor 3. Types$NamedType.symbol self%: 0.32 ± 0.04 → 0.47 ± 0.05 (+0.15, 3.0σ) 4. Types$NamedType.symbol tot%: 0.85 ± 0.04 → 0.96 ± 0.05 (+0.11, 2.2σ) 5. Types$NamedType.denot tot%: 3.44 ± 0.17 → 3.06 ± 0.18 (-0.38, 2.1σ) Estimated total speedup: -0.04 ± 0.14 (from rows 1 and 3 above; row 2 and row 5 are overlapping total-time confirmations) Rejected. The synthetic CoreBTypes lambda falls below the summary floor and NamedType.denot total improves, but the direct self-time win is too noisy while NamedType.symbol regresses significantly. The total-time movement is overlapping attribution, so this does not establish a reliable compiler-speed improvement.

ConstraintHandling.addToConstraint now skips the post-add addOneBound calls that would only propagate the canonical Nothing lower bound or Any/AnyKind upper bound across existing ordering edges. The intended hot path is constraint setup for dependent TypeLambdas after OrderingConstraint.init has recorded param ordering but left default non-param bounds; avoiding those no-op bounds should reduce entry, bound-stripping, and update churn. It is safe because adding Nothing as a lower bound or Any/AnyKind as an upper bound cannot narrow any constraint, while non-empty bounds and solved entries still use the existing propagation path. Expected changes: - ConstraintHandling.addToConstraint self% should improve: empty refined lower/upper bounds return without invoking addOneBound for each ordered parameter. - OrderingConstraint.entry self% and tot% should improve: skipped no-op addOneBound calls avoid nonParamBounds, updateEntry, and entry probes reached while propagating default bounds. - OrderingConstraint.StripParamsMap.strip self% and tot% should improve: fewer propagated default bounds reach dependent-parameter stripping. - ConstraintHandling.addToConstraint self% could regress: every TypeBounds entry now pays two identity checks before deciding whether to propagate lower and upper edges. - Typer.typedNamed tot% could regress: broad typer attribution can move if constraint setup changes, so this row checks for normalized caller fallout. - No semantic regressions expected: only the identity-proven default Nothing lower and Any/AnyKind upper bounds are skipped; every non-empty or non-canonical bound still propagates through addOneBound. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-47): 1. OrderingConstraint.entry self%: 0.27 ± 0.06 → 0.28 ± 0.05 (+0.01, 0.2σ) 2. OrderingConstraint.entry tot%: 0.33 ± 0.06 → 0.34 ± 0.05 (+0.01, 0.2σ) 3. OrderingConstraint.StripParamsMap.strip self%: 0.14 ± 0.03 → 0.10 ± 0.03 (-0.04, 1.3σ) 4. OrderingConstraint.StripParamsMap.strip tot%: 0.47 ± 0.04 → 0.34 ± 0.06 (-0.13, 2.2σ) 5. TypeComparer.recur self%: 0.45 ± 0.05 → 0.38 ± 0.05 (-0.07, 1.4σ) 6. TypeComparer.secondTry$1 self%: 0.36 ± 0.04 → 0.25 ± 0.04 (-0.11, 2.7σ) 7. Typer.typedNamed tot%: 59.03 ± 0.29 → 60.81 ± 0.68 (+1.78, 2.6σ) Estimated total speedup: +0.03 ± 0.09 (from rows 1 and 3 above; rows 5-6 are broad subtype-comparer movement and row 7 is an overlapping caller total) Rejected. OrderingConstraint.entry self-time and total-time both move slightly against the change, while the StripParamsMap.strip improvement leaves the direct self-row estimate within noise. The broader TypeComparer rows improve, but the main typer caller regresses significantly, so skipping canonical empty bound propagation does not establish a reliable targeted win.

ClassDenotation.findMember now returns NoDenotation immediately when the post-filtered member pre-denotation is empty, and only widens OrType prefixes before asSeenFrom on non-empty results. The iter-23/run-0 profile had ClassDenotation.findMember at 9.50% total, so the intended win was to skip prefix widening and denotation translation for filtered misses. This is semantically safe because NoDenotation.asSeenFrom(...).toDenot(...) is empty, and the existing raw member lookup plus flag filtering still run before the early return. Expected changes: - ClassDenotation.findMember self% should improve: empty filtered member results skip OrType prefix widening, asSeenFrom dispatch, and toDenot conversion. - ClassDenotation.findMember tot% should improve: callers with filtered misses should leave less work in the member lookup subtree. - Type.widenIfUnstable self% and tot% should improve or stay neutral: fewer empty lookups should need prefix widening before returning NoDenotation. - PreDenotation.asSeenFrom and SingleDenotation.computeAsSeenFrom should improve if filtered misses previously reached denotation translation. - No semantic regressions expected: NoDenotation remains the result for empty filtered lookups, and non-empty results still use the same prefix normalization and asSeenFrom path. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-48): 1. ClassDenotation.findMember self%: 0.20 ± 0.09 → 0.16 ± 0.05 (-0.04, 0.4σ) 2. ClassDenotation.findMember tot%: 9.50 ± 0.20 → 9.81 ± 0.36 (+0.31, 0.9σ) 3. Type.widenIfUnstable self%: 0.14 ± 0.04 → 0.21 ± 0.05 (+0.07, 1.4σ) 4. Type.widenIfUnstable tot%: 0.73 ± 0.07 → 0.81 ± 0.04 (+0.08, 1.1σ) Estimated total speedup: -0.03 ± 0.12 (from rows 1 and 3 above) Rejected. ClassDenotation.findMember self-time moves in the intended direction but only by 0.4σ, while its total time moves against the change just under the significance threshold. Type.widenIfUnstable regresses significantly, and the expected asSeenFrom rows remain below the summary floor, so the fast return does not establish a reliable targeted win.

Type.dealias now checks whether a TypeRef already carries a current-run ClassSymbol designator before falling back to tp.symbol.isClass. The iter-23/run-0 profile had Type.dealias at 1.14% self / 1.56% total and NamedType.symbol at 0.32% self, so the intended win was to bypass the symbolic-designator symbol path for common class references. The guard is safe because it only returns for ClassSymbols whose last-known denotation is from the current run and still maps back to the same symbol; stale symbols and name designators keep the existing denotation-based fallback. Expected changes: - Type.dealias self% should improve: current-run class-designator TypeRefs return before calling through NamedType.symbol. - NamedType.symbol self% should improve: TypeRef.dealias should stop using the symbolic-designator symbol fast path for these class references. - NamedType.denot tot% should improve: class-designator TypeRefs that return from dealias earlier should leave less work in the denotation subtree. - Symbol.denot self% should improve or stay neutral: the new guard reads lastKnownDenotation without stillValid, and stale symbols still follow the old symbol path. - Type.dealias self% could regress: every TypeRef pays a designator match before the existing tp.symbol.isClass fallback. - No semantic regressions expected: only current-run ClassSymbols that still denote themselves bypass symbol recovery, and every opaque-alias, stale-symbol, or name-designated TypeRef keeps the old info/dealias behavior. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-49): 1. Type.dealias self%: 1.14 ± 0.11 → 1.21 ± 0.08 (+0.07, 0.6σ) 2. Type.dealias tot%: 1.56 ± 0.10 → 1.62 ± 0.10 (+0.06, 0.6σ) 3. NamedType.symbol self%: 0.32 ± 0.04 → 0.37 ± 0.03 (+0.05, 1.2σ) 4. NamedType.symbol tot%: 0.85 ± 0.04 → 0.85 ± 0.03 (+0.00, 0.0σ) 5. Symbol.denot self%: 1.54 ± 0.10 → 1.53 ± 0.26 (-0.01, 0.0σ) 6. NamedType.denot tot%: 3.44 ± 0.17 → 3.17 ± 0.07 (-0.27, 1.6σ) Estimated total speedup: -0.12 ± 0.15 (from rows 1 and 3 above; rows 2, 4, and 6 are overlapping total-time confirmations) Rejected. Type.dealias moves against the change and stays below the significance threshold, while NamedType.symbol self-time regresses significantly. The NamedType.denot total-time improvement is overlapping attribution rather than a direct self-time win, so the class-designator guard does not establish a reliable compiler-speed improvement.

Interpreter now keeps per-instance caches for loaded macro classes, package MODULE$ instances, reflected static methods, reflected static fields, constructors, and erased parameter signatures. The iter-23/run-0 allocation tree attributed Resource.getBytes under Interpreter.loadClass through first-time URLClassLoader class definition, but the classloader already caches loaded classes; these maps only avoid repeated reflection lookup within one Interpreter instance and are scoped so they do not retain macro classloaders across runs. This is safe because cache misses still use the existing load and reflection paths, nested module construction keeps the old fresh-instance behavior, and no cache is shared across compilation runs or classloaders. Expected changes: - Interpreter.loadClass self% and tot% should improve: repeated macro class loads in one interpreter instance can return from the local binary-name cache before delegating to the macro classloader. - Interpreter.loadModule tot% should improve: top-level package MODULE$ instances can be reused across repeated interpreted calls. - interpretedStaticMethodCall tot% should improve: reflected methods, constructors, fields, and parameter signatures can be reused across repeated interpreted calls. - HashMap$Node.findNode self% could regress: every interpreter now owns several mutable caches and cache probes can become visible if reflection lookup was already below the profiling floor. - DirectMethodHandle.allocateInstance self% could regress: constructor/module reflection remains on nested module and macro object paths, and a cache miss does not remove reflective allocation work. - No semantic regressions expected: failed lookups are not cached, nested modules keep the previous construction behavior, and all caches are local to the interpreter and its classloader. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-50): 1. Interpreter.loadClass self%: below floor → below floor 2. Interpreter.loadModule tot%: below floor → below floor 3. interpretedStaticMethodCall tot%: below floor → below floor 4. HashMap$Node.findNode self%: 0.28 ± 0.04 → 0.32 ± 0.05 (+0.04, 0.8σ) 5. DirectMethodHandle.allocateInstance self%: 0.45 ± 0.06 → 0.54 ± 0.05 (+0.09, 1.5σ) 6. Typer.typed tot%: 66.78 ± 0.31 → 68.41 ± 0.41 (+1.63, 4.0σ) Estimated total speedup: -0.13 ± 0.10 (from rows 4 and 5 above; rows 1-3 are below floor and row 6 is an overlapping caller total) Rejected. The expected interpreter and classloader rows remain below the summary floor in both profiles, so the cache does not establish a direct targeted win. The visible movement is neutral-to-negative cache overhead plus a significant broad typer total-time regression, matching the research concern that class loading was already cached below the URLClassLoader byte-allocation attribution.

Type.widenSingleton now returns direct TypeRef and MethodOrPoly receivers before calling stripped, leaving type variables, annotations, TermRefs, and other singleton wrappers on the existing stripped path. The iter-23/run-0 profile had Type.widenSingleton at 0.12% self / 1.71% total, so the intended win was to remove one wrapper-stripping dispatch from frequent non-singleton identity calls. This is safe because the new early returns cover only shapes that previously stripped to themselves and returned this from the default arm. Expected changes: - Type.widenSingleton self% should improve: direct TypeRef and MethodOrPoly receivers bypass stripped before returning this. - Type.widenSingleton tot% should improve: callers that mostly widen identity types should spend less time in singleton widening. - Type.widenIfUnstable self% and tot% should improve or stay neutral: typed paths using singleton widening should inherit less identity overhead. - Type.widen self% and tot% should improve or stay neutral: top-level widening calls that reach singleton widening should inherit less identity overhead. - Type.widenSingleton self% could regress: every non-hit pays an extra top-level match before the existing stripped match. - No semantic regressions expected: wrapped types still strip before matching, TermRefs still inspect denotations, and SingletonTypes still widen through underlying. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-53): 1. Type.widenSingleton self%: 0.12 ± 0.04 → 0.12 ± 0.06 (+0.00, 0.0σ) 2. Type.widenSingleton tot%: 1.71 ± 0.07 → 1.78 ± 0.10 (+0.07, 0.7σ) 3. Type.widenIfUnstable self%: 0.14 ± 0.04 → 0.21 ± 0.08 (+0.07, 0.9σ) 4. Type.widenIfUnstable tot%: 0.73 ± 0.07 → 0.78 ± 0.09 (+0.05, 0.6σ) 5. Type.widen self%: 0.16 ± 0.04 → 0.20 ± 0.03 (+0.04, 1.0σ) 6. Type.widen tot%: 2.83 ± 0.08 → 2.85 ± 0.11 (+0.02, 0.2σ) Estimated total speedup: -0.11 ± 0.13 (from rows 1, 3, and 5 above; rows 2, 4, and 6 are overlapping total-time confirmations) Rejected. Type.widenSingleton self-time does not move, while its total-time row and the adjacent widening self-time rows move slightly against the change. With no reliable targeted win and a negative self-row estimate, the added top-level identity match does not establish a reliable compiler-speed improvement.

Type.memberBasedOnFlags now handles direct class TypeRef and class AppliedType lookups by asking the ClassDenotation for the raw flag-filtered member pre-denotation before materializing the asSeenFrom prefix. The iter-23/run-0 profile had ClassDenotation.findMember at 9.50% total and Type.widenIfUnstable at 0.14% self / 0.73% total, so the intended win was to return NoDenotation for class-member misses before doing prefix stability work. The change is safe because non-empty results still use the same raw membersNamed/nonPrivateMembersNamed lookup, flag filtering, OrType prefix widening, and asSeenFrom conversion, while empty filtered results are the same NoDenotation result the old path produced. Expected changes: - Type.widenIfUnstable self% and tot% should improve: direct class-denotation misses can return NoDenotation before materializing a stable prefix. - ClassDenotation.findMember tot% should improve: Type.memberBasedOnFlags bypasses the strict-prefix wrapper for direct class TypeRef and class AppliedType lookups. - ClassDenotation.membersNamed self% could regress: the direct miss path still performs the same member-cache lookup, but its samples can move out of findMember into the cache row. - Typer.typed tot% could regress: member lookup is a typer-heavy path, so any extra dispatch or cache attribution can show up in the broad typer caller. - No semantic regressions expected: all non-direct type shapes still use Type.findMember, recursive-search bookkeeping is preserved on the direct path, and hit results still translate through asSeenFrom with the old prefix. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-54): 1. ClassDenotation.findMember self%: 0.20 ± 0.09 → 0.13 ± 0.05 (-0.07, 0.8σ) 2. ClassDenotation.findMember tot%: 9.50 ± 0.20 → 8.37 ± 0.20 (-1.13, 5.7σ) 3. Type.widenIfUnstable self%: 0.14 ± 0.04 → below floor 4. Type.widenIfUnstable tot%: 0.73 ± 0.07 → below floor 5. ClassDenotation.membersNamed self%: 0.35 ± 0.23 → 0.63 ± 0.12 (+0.28, 1.2σ) 6. Typer.typed tot%: 66.78 ± 0.31 → 69.08 ± 0.61 (+2.30, 3.8σ) Estimated total speedup: -0.07 ± 0.28 (from rows 1, 3, and 5 above; row 3 is counted as falling to zero below the summary floor, and rows 2, 4, and 6 are overlapping total-time confirmations) Rejected. The prefix materialization win is visible in Type.widenIfUnstable and ClassDenotation.findMember, but it is offset by the visible ClassDenotation.membersNamed self-time regression after the direct class path moves lookup work out of findMember. The broad Typer.typed total-time regression reinforces that the split dispatch does not establish a reliable compiler-speed improvement.

AppliedUniques.bucketIndex now checks `splitBucket == 0` before the existing linear-hash split comparison, so common unsplit applied-type uniqueness tables return after the base mask without testing `base < splitBucket`. This path is hot through AppliedType$.apply at 2.58% total and AppliedUniques.enterIfNew at 0.36% self / 2.39% total in iter-23/run-0, but the added branch failed to improve direct enterIfNew self-time and the hotter generated probe-loop row regressed. The change is safe because nonzero split buckets keep the same expanded-mask path and all resize, stale-removal, and insertion rules are unchanged. Expected changes: - AppliedUniques.enterIfNew self% should improve: cached applied-type lookups in an unsplit table avoid the `base < splitBucket` check in bucket indexing. - AppliedUniques.linkedListLoop$2 self% should improve: cheaper bucket selection leaves less work around one generated applied-type probe loop. - AppliedType$.apply tot% should improve: the enclosing applied-type factory inherits any cheaper uniqueness-table probe. - AppliedUniques.linkedListLoop$6 self% could regress: the added split-zero branch can shift work or branch shape into the hotter generated probe loop. - No semantic regressions expected: bucket selection is identical for split and unsplit states, and only the order of equivalent tests changed. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-55): 1. AppliedUniques.enterIfNew self%: 0.36 ± 0.04 → 0.32 ± 0.10 (-0.04, 0.4σ) 2. AppliedUniques.enterIfNew tot%: 2.39 ± 0.09 → 2.04 ± 0.13 (-0.35, 2.7σ) 3. AppliedType$.apply tot%: 2.58 ± 0.12 → 2.19 ± 0.14 (-0.39, 2.8σ) 4. AppliedUniques.linkedListLoop$2 self%: 0.26 ± 0.05 → 0.15 ± 0.02 (-0.11, 2.2σ) 5. AppliedUniques.linkedListLoop$6 self%: 0.39 ± 0.06 → 0.47 ± 0.07 (+0.08, 1.1σ) Estimated total speedup: +0.07 ± 0.15 (from rows 1, 4, and 5 above; rows 2-3 are overlapping caller totals) Rejected. The exclusive-row estimate remains within uncertainty despite the linkedListLoop$2 improvement, and the hotter linkedListLoop$6 probe row regresses while enterIfNew self-time stays within noise. The stronger enterIfNew and AppliedType$.apply total drops are overlapping caller attribution, so they do not justify accepting a bucket-indexing branch that failed to improve the direct owner row.

liftToThis now returns immediately for TypeRef(NoPrefix, _) and AppliedType whose tycon is that same no-prefix TypeRef shape, avoiding recursive prefix/tycon calls that old code would return unchanged. The iter-23/run-0 profile put liftToThis at 0.13% self / 0.27% total and TypeComparer.recur at 14.75% total, so the intended win is a small reduction in lifted-this retry setup on a hot subtype-comparison path. This is safe because NoPrefix cannot contain an enclosing module reference, and AppliedType only lifted the tycon before this change; arguments keep the existing untouched behavior. Expected changes: - TypeComparer.liftToThis self% and tot% should improve: exact no-prefix TypeRefs and matching AppliedTypes skip recursive prefix/tycon dispatch. - TypeComparer.compareNamed$1 tot% should improve: named comparisons that retry with lifted-this types should inherit cheaper no-prefix identity handling. - TypeComparer.secondTry$1 self% and tot% should improve: fallback subtype attempts can call tryLiftedToThis after prefix stability checks. - Typer.typed tot% could regress: if the direct liftToThis row stays unchanged, broad typer attribution can dominate any subtype-comparer movement. - No other regressions expected: TermRef modules and package ThisTypes still use opaque-aware findEnclosingThis, and only shapes the old recursion returned by identity are short-circuited. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-56): 1. TypeComparer.liftToThis self%: 0.13 ± 0.03 → 0.13 ± 0.03 (+0.00, 0.0σ) 2. TypeComparer.liftToThis tot%: 0.27 ± 0.05 → 0.25 ± 0.04 (-0.02, 0.4σ) 3. TypeComparer.compareNamed$1 tot%: 3.72 ± 0.18 → 3.41 ± 0.15 (-0.31, 1.7σ) 4. TypeComparer.secondTry$1 self%: 0.36 ± 0.04 → 0.25 ± 0.03 (-0.11, 2.7σ) 5. TypeComparer.secondTry$1 tot%: 7.26 ± 0.26 → 6.80 ± 0.13 (-0.46, 1.8σ) 6. Typer.typed tot%: 66.78 ± 0.31 → 69.15 ± 0.32 (+2.37, 7.4σ) Estimated total speedup: +0.00 ± 0.04 (from row 1 above; rows 2-6 are overlapping target total, subtype-comparer caller confirmations, and broad typer attribution) Rejected. The direct liftToThis self row is unchanged, and the target total row moves by only 0.4σ, so the intended shortcut does not pay for itself. The subtype-comparer caller improvements are outweighed by the Typer.typed total regression of +2.37 at 7.4σ, so this belongs on rejected.

AppliedUniques now stores bucket occupancy bits in one flat Long array instead of one nullable 16-word array per bucket page, leaving bucket-head paging and linear-hash split rules unchanged. The iter-23/run-0 profile put AppliedUniques.enterIfNew at 0.36% self / 2.39% total, addPagedEntryAt at 0.12% self / 0.32% total, and probe counters in research showed about 1.9M empty-split skips, so the intended win was fewer page-reference loads in empty-bucket scans and bit updates. This is safe because each bit is still indexed by the same bucket number, stale-entry removal and split migration still mark buckets on head transitions, and out-of-range flat words are treated as empty during split skipping. Expected changes: - AppliedUniques.addPagedEntryAt self% and tot% should improve: a new bucket head marks occupancy with one flat word load/store instead of loading a per-page bitmap array. - AppliedUniques.enterIfNew tot% should improve: insertion-triggered growth and empty-split skipping inherit cheaper occupancy scans. - AppliedUniques.linkedListLoop$2 self% should improve: cheaper bucket growth around applied-type lookup leaves less work in this probe loop. - AppliedUniques.linkedListLoop$6 self% could regress: the flatter scan changes branch and cache shape around the hotter generated probe loop. - Typer.typed tot% could regress: applied-type uniqueness sits under typer-heavy paths, so a probe-loop regression can widen broad typer attribution. - No semantic regressions expected: bucket-head pages are unchanged, occupied bits are set and cleared on the same null/non-null head transitions, and missing flat words mean no occupied buckets just as missing occupancy pages did. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-58): 1. AppliedUniques.addPagedEntryAt self%: 0.12 ± 0.05 → below floor 2. AppliedUniques.enterIfNew self%: 0.36 ± 0.04 → 0.30 ± 0.07 (-0.06, 0.9σ) 3. AppliedUniques.enterIfNew tot%: 2.39 ± 0.09 → 2.12 ± 0.08 (-0.27, 3.0σ) 4. AppliedUniques.linkedListLoop$2 self%: 0.26 ± 0.05 → 0.10 ± 0.10 (-0.16, 1.6σ) 5. AppliedUniques.linkedListLoop$6 self%: 0.39 ± 0.06 → 0.60 ± 0.10 (+0.21, 2.1σ) 6. AppliedUniques.linkedListLoop$6 tot%: 0.98 ± 0.07 → 1.11 ± 0.04 (+0.13, 1.9σ) 7. Typer.typed tot%: 66.78 ± 0.31 → 69.24 ± 0.68 (+2.46, 3.6σ) Estimated total speedup: +0.13 ± 0.18 (from rows 1, 2, 4, and 5 above; rows 3, 6, and 7 are overlapping total-time confirmations) Rejected. The direct exclusive rows stay within uncertainty once the linkedListLoop$6 regression is included, and enterIfNew self-time moves by only 0.9σ. The significant broad Typer.typed total regression confirms that flattening the occupancy bitmap does not pay for itself.

MutableScope now maintains a conservative one-word name bloom and collectLinearizedNoOwn consults it before probing each ancestor declaration scope with denotsNamed. The inherited-member scan is hot in iter-23/run-0, with ClassDenotation.membersNamed at 6.04% total, MutableScope.lookupEntry at 0.76% self, and EqHashMap$HashedOnly.lookup at 0.51% self; skipping definite direct-scope misses should reduce both lookup rows. The mechanism is safe because bloom positives keep the existing lookup path, synthesizing and prefilled scopes answer conservatively, and deletions only leave stale positives. Expected changes: - MutableScope.lookupEntry self% should improve: ancestor scopes whose bloom lacks the queried name skip lookupEntry entirely. - EqHashMap$HashedOnly.lookup self% should improve: skipped declaration probes avoid direct-scope hash-table lookups in inherited scans. - ClassDenotation.membersNamed self% could regress: every linearized no-own scan pays an extra bloom test and branch before the existing denotsNamed path. - No other regressions expected: false positives preserve the old lookup, synthesizer scopes and prefilled scopes are conservative, and unlink never creates false negatives. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-59): 1. MutableScope.lookupEntry self%: 0.76 ± 0.13 → 0.74 ± 0.10 (-0.02, 0.2σ) 2. EqHashMap$HashedOnly.lookup self%: 0.51 ± 0.17 → 0.36 ± 0.24 (-0.15, 0.6σ) 3. ClassDenotation.membersNamed self%: 0.35 ± 0.23 → 0.73 ± 0.25 (+0.38, 1.5σ) Estimated total speedup: -0.21 ± 0.48 (from rows 1-3 above) Rejected. The intended direct lookup and hash lookup self rows move down but remain well inside noise, while ClassDenotation.membersNamed self-time regresses by +0.38 at 1.5σ. The summed self-row estimate is negative, so the bloom test does not pay for itself on the linearized inherited-scan path.

TreeTypeMap.mapType now keeps two scalar identity slots before allocating or consulting myMapTypeCache, promoting those slots into the existing EqHashMap on the third distinct cached type. The path is hot because TreeTypeMap.transform is 7.48% total in iter-23/run-0 and the existing cache lookup sits on the type-remapping path, so adjacent repeated types were expected to avoid hash lookup and tiny maps were expected to avoid cache allocation. This is safe because the EqHashMap remains authoritative after promotion, scalar hits are exact reference-identity hits, and uncached pure typeMap calls still bypass this cache. Expected changes: - EqHashMap.lookup self% should improve: adjacent scalar hits should return before identity hash lookup and dense-table probing. - TreeTypeMap.mapType self% should improve: one- or two-type maps should avoid full EqHashMap allocation and lookup/update plumbing. - TreeTypeMap.transform tot% should improve: the enclosing tree walk should inherit cheaper mapType calls. - TreeTypeMap.mapType self% could regress: every cacheable mapType call pays two extra scalar identity checks and promotion bookkeeping. - TreeTypeMap.<init> self% could regress: each TreeTypeMap carries four extra scalar cache fields. - No semantic regressions expected: cache entries are written only for the same deterministic cases that already used myMapTypeCache, and misses still compute through computeMapType before being recorded. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-61): 1. EqHashMap.lookup self%: 0.53 ± 0.04 → 0.59 ± 0.03 (+0.06, 1.5σ) 2. EqHashMap.lookup tot%: 0.54 ± 0.03 → 0.61 ± 0.04 (+0.07, 1.7σ) 3. TreeTypeMap.mapType self%: below floor → 0.09 ± 0.03 4. TreeTypeMap.mapType tot%: below floor → 3.83 ± 0.13 5. TreeTypeMap.transform tot%: 7.48 ± 0.18 → 7.68 ± 0.19 (+0.20, 1.1σ) 6. TreeTypeMap.<init> self%: 0.10 ± 0.01 → 0.11 ± 0.04 (+0.01, 0.2σ) Estimated total speedup: -0.16 ± 0.07 (from rows 1, 3, and 6 above, conservatively treating the below-floor TreeTypeMap.mapType before value as zero; rows 2, 4, and 5 are overlapping total-time confirmations) Rejected. The intended EqHashMap.lookup win moves the wrong way above the significance threshold, and TreeTypeMap.mapType becomes visible as new exclusive overhead. The broad TreeTypeMap.transform total row also regresses at 1.1σ, so the scalar front cache does not pay for itself.

OrderingConstraint.remove now skips materializing deferred reverse dependencies when the removed TypeLambda is still in dirtyDeps, deletes that binder from dirtyDeps, prunes direct lower/upper maps, and removes any already-materialized reverse-dependency references to the binder. The path was expected to avoid add-then-remove dependency traversal for short-lived inference variables under TypeComparer.recur, and it is safe because non-dirty removals still force the complete dependency view before subtracting bounds. Expected changes: - OrderingConstraint.remove self% should improve: dirty binders can be removed without first indexing their deferred bounds into coDeps and contraDeps. - OrderingConstraint.Adjuster.traverse self% and tot% should improve: short-lived dirty binders avoid materializeDeps add traversal followed by removal traversal. - TypeComparer.secondTry$1 self% could improve: rollback and solving paths that remove fresh inference variables inherit less constraint dependency work. - Typer.typed tot% could regress: if the direct remove path stays below the profile floor, broad typer attribution can dominate the subtype-comparer movement. - No correctness regressions expected: non-dirty removals keep the old materialized path, dirty removals still drop dirtyDeps, direct ordering edges, hard vars, and reverse-dependency keys/sources for the removed binder. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-66): 1. OrderingConstraint.remove self%: below floor → below floor 2. OrderingConstraint.Adjuster.traverse self%: below floor → below floor 3. TypeComparer.secondTry$1 self%: 0.36 ± 0.04 → 0.27 ± 0.02 (-0.09, 2.2σ) 4. OrderingConstraint.StripParamsMap.strip tot%: 0.47 ± 0.04 → 0.30 ± 0.05 (-0.17, 3.4σ) 5. Typer.typed tot%: 66.78 ± 0.31 → 68.65 ± 0.47 (+1.87, 4.0σ) 6. Typer.typedNamed tot%: 59.03 ± 0.29 → 60.91 ± 0.29 (+1.88, 6.5σ) Estimated total speedup: +0.09 ± 0.04 (from row 3 above; rows 4-6 are overlapping or broad attribution) Rejected. The direct OrderingConstraint.remove and Adjuster.traverse rows remain below the profile summary floor, so the dirty-removal shortcut does not establish a direct targeted win. TypeComparer.secondTry$1 improves, but the significant broad Typer.typed and Typer.typedNamed total-time regressions leave the attempt too noisy to accept.

Type.implicitMembers now builds TermRefs while walking implicit member names and denotation alternatives, avoiding the intermediate implicit-denotation list and final List.map in wildcard implicit-scope collection. The path was expected to reduce OfTypeImplicits.refs work, and it is safe because it preserves member-name order, MultiDenotation alternative order, the implicit/given predicate, the private-shadow fallback, and TermRef prefix/symbol construction. Expected changes: - List.map self% and tot% should improve: direct TermRef collection removes the final denotation-to-ref map. - OfTypeImplicits.refs tot% should improve: companion implicit-scope collection avoids building per-name implicit-denotation lists. - NamedTypeUniques.enterIfNew and linkedListLoop$1 tot% could regress: TermRef creation count is unchanged, so uniquing can remain on the hot path. - CachedTermRef.<init> self% could regress: building TermRefs directly can shift constructor attribution into the fused walker. - No semantic regressions expected: the direct walker keeps the same name filter, alternative order, implicit/given predicate, and private-shadow fallback used by implicitMembersNamed. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-67): 1. List.map self%: 0.30 ± 0.02 → 0.31 ± 0.03 (+0.01, 0.3σ) 2. List.map tot%: 8.15 ± 0.18 → 7.76 ± 0.14 (-0.39, 2.2σ) 3. NamedTypeUniques.enterIfNew tot%: 2.13 ± 0.12 → 2.33 ± 0.08 (+0.20, 1.7σ) 4. NamedTypeUniques.linkedListLoop$1 tot%: 1.55 ± 0.13 → 1.78 ± 0.08 (+0.23, 1.8σ) 5. CachedTermRef.<init> self%: below floor → 0.13 ± 0.14 Estimated total speedup: -0.14 ± 0.14 (from rows 1 and 5 above, conservatively treating the below-floor CachedTermRef before value as zero; rows 2-4 are overlapping total-time confirmations) Rejected. The intended List.map total row improves, but its self row is unchanged and direct TermRef construction becomes visible. NamedTypeUniques total-time rows move the wrong way, so the fused implicitMembers walker does not pay for the unchanged TermRef uniquing work.

PackageScope now records identity-based negative results for derived package-name lookups after the mangled lookup and flat-class fallback miss, and clears that cache whenever a package entry is added. This was expected to reduce repeated MutableScope.lookupEntry work for derived misses. The change is safe because positive lookups still use the existing scope table, flat-class loading still runs before recording negatives, and package mutations invalidate the miss cache. Expected changes: - MutableScope.lookupEntry self% and tot% should improve: repeated derived package misses can return before materializing name.mangled and probing the scope table. - Denotation.info tot% could regress: a cache branch on package lookups can shift completion attribution if the derived-miss hit rate is too low. - SymDenotation.completeFrom tot% could regress: package lookup work sits under completion, so an added miss-cache branch can widen completion totals if it does not produce enough hits. - No other regressions expected: cached negatives are identity-based, recorded only after flat-name fallback has run or been ruled out, and cleared on every package-scope insertion. JFR profile deltas (5 repeats × 10 runs, mean ± stddev, iter-23/run-0 → iter-23/run-68): 1. MutableScope.lookupEntry self%: 0.76 ± 0.13 → 0.77 ± 0.06 (+0.01, 0.1σ) 2. MutableScope.lookupEntry tot%: 0.81 ± 0.10 → 0.84 ± 0.06 (+0.03, 0.3σ) 3. Denotation.info tot%: 19.21 ± 0.32 → 19.84 ± 0.29 (+0.63, 2.0σ) 4. SymDenotation.completeFrom tot%: 18.95 ± 0.30 → 19.60 ± 0.26 (+0.65, 2.2σ) Estimated total speedup: -0.01 ± 0.14 (from row 1 above; rows 2-4 are overlapping total-time confirmations) Rejected. MutableScope.lookupEntry does not improve in either self or total time, while Denotation.info and SymDenotation.completeFrom show significant completion-total regressions. The derived miss cache therefore does not pay for its extra package-lookup branch on this workload.

lihaoyi mentioned this pull request May 19, 2026

[WIP] Speed up the Scala compiler on the mill-libs-javalib codebase by ~50% #26025

Draft

lihaoyi added 29 commits May 22, 2026 08:54

lihaoyi added 30 commits May 25, 2026 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rejected commits in "Speed up the Scala compiler on the mill-libs-javalib codebase"#26091

Rejected commits in "Speed up the Scala compiler on the mill-libs-javalib codebase"#26091
lihaoyi wants to merge 287 commits into
scala:mainfrom
lihaoyi:rejected

lihaoyi commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lihaoyi commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lihaoyi commented May 19, 2026 •

edited

Loading