Skip to content

CPU Speedups 2#255

Draft
GNiendorf wants to merge 1 commit intomasterfrom
cpu_speedups_merge_2
Draft

CPU Speedups 2#255
GNiendorf wants to merge 1 commit intomasterfrom
cpu_speedups_merge_2

Conversation

@GNiendorf
Copy link
Copy Markdown
Member

@GNiendorf GNiendorf commented Apr 20, 2026

I will add more tomorrow, and a breakdown of the changes. Gives a 15% improvement in short time.

Master Timing
Screenshot 2026-04-21 at 12 37 26 AM

This PR Timing
Screenshot 2026-04-21 at 12 31 36 AM

@GNiendorf
Copy link
Copy Markdown
Member Author

run-ci: [all, hlt]

@github-actions
Copy link
Copy Markdown

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     30.9     91.2    121.3    126.2     44.5    701.0      9.8     39.7     68.9    206.2      0.1    1439.9     707.9+/- 178.7     470.7   explicit[s=4] (target branch)
   avg     30.8     90.5    102.6    108.1     44.9    683.8     10.6     36.7     59.2    157.1      0.1    1324.5     609.9+/- 151.5     444.7   explicit[s=4] (this PR)

@github-actions
Copy link
Copy Markdown

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link
Copy Markdown

The PR was built and ran successfully with HLT setup running on CPU (procModifiers = ). Here are some plots.

HLT General Plots
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

Comment on lines +246 to +251
// Module-level eta/phi pre-check: skip module pairs that are too far apart.
if (alpaka::math::abs(acc, modules.eta()[lowmod1] - modules.eta()[lowmod2]) > 0.3f)
continue;
if (alpaka::math::abs(acc, cms::alpakatools::deltaPhi(acc, modules.phi()[lowmod1], modules.phi()[lowmod2])) >
0.5f)
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it tested on the µcube? 50 cm should be safe enough.

How was this check derived, from printouts or some other way?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make a difference to loop over the connected modules instead of all nEligibleT5Modules?

Comment on lines +62 to +71
template <alpaka::concepts::Acc TAcc>
ALPAKA_FN_ACC ALPAKA_FN_INLINE float clampedApproxSin(TAcc const& acc, float x) {
return alpaka::math::min(acc, x, kSinAlphaMax);
}

// Small-angle sin approximation: sin(x) ~ x for x after tight angular cuts.
ALPAKA_FN_ACC ALPAKA_FN_INLINE float fastSin(float x) { return x; }

// Small-angle Pade approximant of tan(x)/x.
ALPAKA_FN_ACC ALPAKA_FN_INLINE float fastTanOverX(float x) { return 1.f + x * x / 3.f; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be ifdef to the full precision, if we were to recompile and test.
Approx should probably be in all names (perhaps instead of fast)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants