-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Make BB_UNITY_WEIGHT
1.0
#112151
base: main
Are you sure you want to change the base?
JIT: Make BB_UNITY_WEIGHT
1.0
#112151
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr libraries-pgo, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined. I recall being annoyed that CSE mixes the two somewhat willy-nilly (2d in #92915 (comment)) |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are quite a bit larger on win-x64 and win-x86, with the former's concentrated in Miniscule changes in block weights churning LSRA/layout, ; V00 arg0 [V00,T02] ( 3, 6 ) byref -> rbx single-def
; V01 arg1 [V01,T06] ( 7, 6 ) double -> mm6 ld-addr-op single-def
-; V02 arg2 [V02,T07] ( 7, 5.00) double -> [rsp+0x60] ld-addr-op single-def
+; V02 arg2 [V02,T07] ( 7, 5 ) double -> mm7 ld-addr-op single-def or more/fewer CSEs. ;* V07 tmp5 [V07 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
; V08 tmp6 [V08,T03] ( 2, 1.33) byref -> edx single-def "argument with side effect"
+; V09 cse0 [V09,T04] ( 3, 1 ) int -> eax "CSE #01: moderate" Thanks! |
I see, I suppose the removal of |
Part of #107749. Change
BB_UNITY_WEIGHT
and friends from 100 to 1 to avoid scaling normalized block weights up unnecessarily. I wanted this to be a no-diff change, but I could only get so far; I implemented the following quirks to keep diffs down:BB_COLD_WEIGHT
has also been reduced by 100x, as our layout optimizations were using this threshold withBB_UNITY_WEIGHT
factored in. In a follow-up PR, I think we should increase this back to 0.01: I think a block is sufficiently cold if its normalized weight suggests it executes only 1% of the time per method invocation. Increasing the amount of code we consider cold should also lessen the amount of work 3-opt needs to do.BB_UNITY_WEIGHT
churned CSE significantly, and I think it's because we aren't quite careful about not mixing normalized weights and counts during cost/benefit analysis. I've added some scaling quirks in to avoid these diffs for now. I might be misunderstanding the utility of this distinction, but I wonder if we can unify weighted and non-weighted counts (LclVarDsc::m_lvRefCnt
andLclVarDsc::m_lvRefCntWtd
,CSEdsc::csdDefCount
andCSEdsc::csdDefWtCnt
, etc) after this goes in.BasicBlock::getBBWeight
already normalizes withfgCalledCount
, and wrong in cases where the entry block is reachable via loop backedges. I'll try fixing these in a follow-up PR.With these quirks in-place, I'm seeing sporadic diffs from floating-point imprecision manifesting different decisions due to churn in block weights. For example, we sometimes CSE more/less aggressively due to the score being close to the aggressive threshold, or we sometimes churn layout due to blocks that were almost cold now being considered cold, etc. This churn seems unavoidable unless/until we expand our usage of profile helpers (
Compiler::fgProfileWeightsEqual
) to compare weights.