Skip to content
This repository was archived by the owner on Jan 31, 2024. It is now read-only.

Commit 51a24e4

Browse files
committed
Don't move cold code out of loop by checking bb count
v8 changes: 1. Use hotter_than_inner_loop instead of colder to store a hotter loop nearest to loop. 2. Update the logic in fill_coldest_and_hotter_out_loop and get_coldest_out_loop to make common case O(1). 3. Update function argument bb_colder_than_loop_preheader. 4. Make cached array to vec<class *loop> for index checking. v7 changes: 1. Refine get_coldest_out_loop to replace loop with checking pre-computed coldest_outermost_loop and colder_than_inner_loop. 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and colder_than_inner_loop recursively without loop. v6 changes: 1. Add function fill_coldest_out_loop to pre compute the coldest outermost loop for each loop. 2. Rename find_coldest_out_loop to get_coldest_out_loop. 3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c. v5 changes: 1. Refine comments for new functions. 2. Use basic_block instead of count in bb_colder_than_loop_preheader to align with function name. 3. Refine with simpler implementation for get_coldest_out_loop and ref_in_loop_hot_body::operator for better understanding. v4 changes: 1. Sort out profile_count comparision to function bb_cold_than_loop_preheader. 2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare. 3. Split RTL invariant motion part out. 4. Remove aux changes. v3 changes: 1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop. 2. Remove unnecessary changes. 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p. 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused infinite loop when implementing v1 and the iteration is missed to be updated actually. v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html ... v8: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586209.html There was a patch trying to avoid move cold block out of loop: https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html Richard suggested to "never hoist anything from a bb with lower execution frequency to a bb with higher one in LIM invariantness_dom_walker before_dom_children". In gimple LIM analysis, add get_coldest_out_loop to move invariants to expected target loop, if profile count of the loop bb is colder than target loop preheader, it won't be hoisted out of loop. Likely for store motion, if all locations of the REF in loop is cold, don't do store motion of it. SPEC2017 performance evaluation shows 1% performance improvement for intrate GEOMEAN and no obvious regression for others. Especially, 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% on P8LE. gcc/ChangeLog: 2021-12-21 Xionghu Luo <[email protected]> * tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New function. (get_coldest_out_loop): New function. (determine_max_movement): Use get_coldest_out_loop. (move_computations_worker): Adjust and fix iteration udpate. (class ref_in_loop_hot_body): New functor. (ref_in_loop_hot_body::operator): New. (can_sm_ref_p): Use for_all_locs_in_loop. (fill_coldest_and_hotter_out_loop): New. (tree_ssa_lim_finalize): Free coldest_outermost_loop and hotter_than_inner_loop. (loop_invariant_motion_in_fun): Call fill_coldest_and_hotter_out_loop. gcc/testsuite/ChangeLog: 2021-12-21 Xionghu Luo <[email protected]> * gcc.dg/tree-ssa/recip-3.c: Adjust. * gcc.dg/tree-ssa/ssa-lim-19.c: New test. * gcc.dg/tree-ssa/ssa-lim-20.c: New test. * gcc.dg/tree-ssa/ssa-lim-21.c: New test. * gcc.dg/tree-ssa/ssa-lim-22.c: New test. * gcc.dg/tree-ssa/ssa-lim-23.c: New test.
1 parent cd5ae14 commit 51a24e4

File tree

7 files changed

+293
-3
lines changed

7 files changed

+293
-3
lines changed

gcc/testsuite/gcc.dg/tree-ssa/recip-3.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ float h ()
2323
F[0] += E / d;
2424
}
2525

26-
/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
26+
/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O2 -fdump-tree-lim2-details" } */
3+
4+
volatile int x;
5+
void
6+
bar (int, char *, char *);
7+
void
8+
foo (int *a, int n, int m, int s, int t)
9+
{
10+
int i;
11+
int j;
12+
int k;
13+
14+
for (i = 0; i < m; i++) // Loop 1
15+
{
16+
if (__builtin_expect (x, 0))
17+
for (j = 0; j < n; j++) // Loop 2
18+
for (k = 0; k < n; k++) // Loop 3
19+
{
20+
bar (s / 5, "one", "two");
21+
a[t] = s;
22+
}
23+
a[t] = t;
24+
}
25+
}
26+
27+
/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */
28+
/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
29+
+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O2 -fdump-tree-lim2-details" } */
3+
4+
/* Test that `count' is not hoisted out of loop when bb is cold. */
5+
6+
int count;
7+
volatile int x;
8+
9+
struct obj {
10+
int data;
11+
struct obj *next;
12+
13+
} *q;
14+
15+
void
16+
func (int m)
17+
{
18+
struct obj *p;
19+
for (int i = 0; i < m; i++)
20+
if (__builtin_expect (x, 0))
21+
count++;
22+
23+
}
24+
25+
/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */
+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O2 -fdump-tree-lim2-details" } */
3+
4+
/* Test that `data' and 'data1' is not hoisted out of inner loop and outer loop
5+
when it is in cold loop. */
6+
7+
int count;
8+
volatile int x;
9+
10+
struct obj {
11+
int data;
12+
int data1;
13+
struct obj *next;
14+
};
15+
16+
void
17+
func (int m, int n, int k, struct obj *a)
18+
{
19+
struct obj *q = a;
20+
for (int j = 0; j < m; j++)
21+
if (__builtin_expect (m, 0))
22+
for (int i = 0; i < m; i++)
23+
{
24+
if (__builtin_expect (x, 0))
25+
{
26+
count++;
27+
q->data += 3; /* Not hoisted out to inner loop. */
28+
}
29+
count += n;
30+
q->data1 += k; /* Not hoisted out to outer loop. */
31+
}
32+
}
33+
34+
/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */
35+
+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O2 -fdump-tree-lim2-details" } */
3+
4+
volatile int x;
5+
volatile int y;
6+
void
7+
bar (int, char *, char *);
8+
void
9+
foo (int *a, int n, int m, int s, int t)
10+
{
11+
int i;
12+
int j;
13+
int k;
14+
15+
for (i = 0; i < m; i++) // Loop 1
16+
{
17+
if (__builtin_expect (x, 0))
18+
for (j = 0; j < n; j++) // Loop 2
19+
if (__builtin_expect (y, 0))
20+
for (k = 0; k < n; k++) // Loop 3
21+
{
22+
bar (s / 5, "one", "two");
23+
a[t] = s;
24+
}
25+
a[t] = t;
26+
}
27+
}
28+
29+
/* { dg-final { scan-tree-dump-times "out of loop 3" 4 "lim2" } } */
30+
/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
31+
32+
+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O2 -fdump-tree-lim2-details" } */
3+
4+
volatile int x;
5+
void
6+
bar (int, char *, char *);
7+
void
8+
foo (int *a, int n, int k)
9+
{
10+
int i;
11+
12+
for (i = 0; i < n; i++)
13+
{
14+
if (__builtin_expect (x, 0))
15+
bar (k / 5, "one", "two");
16+
a[i] = k;
17+
}
18+
}
19+
20+
/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */
21+

gcc/tree-ssa-loop-im.c

+150-2
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,11 @@ class im_mem_ref
146146
enum dep_kind { lim_raw, sm_war, sm_waw };
147147
enum dep_state { dep_unknown, dep_independent, dep_dependent };
148148

149+
/* coldest outermost loop for given loop. */
150+
vec<class loop *> coldest_outermost_loop;
151+
/* hotter outer loop nearest to given loop. */
152+
vec<class loop *> hotter_than_inner_loop;
153+
149154
/* Populate the loop dependence cache of REF for LOOP, KIND with STATE. */
150155

151156
static void
@@ -417,6 +422,63 @@ movement_possibility (gimple *stmt)
417422
return ret;
418423
}
419424

425+
/* Compare the profile count inequality of bb and loop's preheader, it is
426+
three-state as stated in profile-count.h, FALSE is returned if inequality
427+
cannot be decided. */
428+
bool
429+
bb_colder_than_loop_preheader (basic_block bb, class loop *loop)
430+
{
431+
gcc_assert (bb && loop);
432+
return bb->count < loop_preheader_edge (loop)->src->count;
433+
}
434+
435+
/* Check coldest loop between OUTERMOST_LOOP and LOOP by comparing profile
436+
count.
437+
It does three steps check:
438+
1) Check whether CURR_BB is cold in it's own loop_father, if it is cold, just
439+
return NULL which means it should not be moved out at all;
440+
2) CURR_BB is NOT cold, check if pre-computed COLDEST_LOOP is outside of
441+
OUTERMOST_LOOP, if it is inside of OUTERMOST_LOOP, return the COLDEST_LOOP;
442+
3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
443+
hotter loop between OUTERMOST_LOOP and loop in pre-computed
444+
HOTTER_THAN_INNER_LOOP, return it's nested inner loop, otherwise return
445+
OUTERMOST_LOOP.
446+
At last, the coldest_loop is inside of OUTERMOST_LOOP, just return it as
447+
the hoist target. */
448+
449+
static class loop *
450+
get_coldest_out_loop (class loop *outermost_loop, class loop *loop,
451+
basic_block curr_bb)
452+
{
453+
gcc_assert (outermost_loop == loop
454+
|| flow_loop_nested_p (outermost_loop, loop));
455+
456+
/* If bb_colder_than_loop_preheader returns false due to three-state
457+
comparision, OUTERMOST_LOOP is returned finally to preserve the behavior.
458+
Otherwise, return the coldest loop between OUTERMOST_LOOP and LOOP. */
459+
if (curr_bb && bb_colder_than_loop_preheader (curr_bb, loop))
460+
return NULL;
461+
462+
class loop *coldest_loop = coldest_outermost_loop[loop->num];
463+
if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
464+
{
465+
class loop *hotter_loop = hotter_than_inner_loop[loop->num];
466+
if (!hotter_loop
467+
|| loop_depth (hotter_loop) < loop_depth (outermost_loop))
468+
return outermost_loop;
469+
470+
/* hotter_loop is between OUTERMOST_LOOP and LOOP like:
471+
[loop tree root, ..., coldest_loop, ..., outermost_loop, ...,
472+
hotter_loop, second_coldest_loop, ..., loop]
473+
return second_coldest_loop to be the hoist target. */
474+
class loop *aloop;
475+
for (aloop = hotter_loop->inner; aloop; aloop = aloop->next)
476+
if (aloop == loop || flow_loop_nested_p (aloop, loop))
477+
return aloop;
478+
}
479+
return coldest_loop;
480+
}
481+
420482
/* Suppose that operand DEF is used inside the LOOP. Returns the outermost
421483
loop to that we could move the expression using DEF if it did not have
422484
other operands, i.e. the outermost loop enclosing LOOP in that the value
@@ -685,7 +747,9 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec)
685747
level = ALWAYS_EXECUTED_IN (bb);
686748
else
687749
level = superloop_at_depth (loop, 1);
688-
lim_data->max_loop = level;
750+
lim_data->max_loop = get_coldest_out_loop (level, loop, bb);
751+
if (!lim_data->max_loop)
752+
return false;
689753

690754
if (gphi *phi = dyn_cast <gphi *> (stmt))
691755
{
@@ -1217,7 +1281,10 @@ move_computations_worker (basic_block bb)
12171281
/* We do not really want to move conditionals out of the loop; we just
12181282
placed it here to force its operands to be moved if necessary. */
12191283
if (gimple_code (stmt) == GIMPLE_COND)
1220-
continue;
1284+
{
1285+
gsi_next (&bsi);
1286+
continue;
1287+
}
12211288

12221289
if (dump_file && (dump_flags & TDF_DETAILS))
12231290
{
@@ -3023,6 +3090,26 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, dep_kind kind)
30233090
return indep_p;
30243091
}
30253092

3093+
class ref_in_loop_hot_body
3094+
{
3095+
public:
3096+
ref_in_loop_hot_body (class loop *loop_) : l (loop_) {}
3097+
bool operator () (mem_ref_loc *loc);
3098+
class loop *l;
3099+
};
3100+
3101+
/* Check the coldest loop between loop L and innermost loop. If there is one
3102+
cold loop between L and INNER_LOOP, store motion can be performed, otherwise
3103+
no cold loop means no store motion. get_coldest_out_loop also handles cases
3104+
when l is inner_loop. */
3105+
bool
3106+
ref_in_loop_hot_body::operator () (mem_ref_loc *loc)
3107+
{
3108+
basic_block curr_bb = gimple_bb (loc->stmt);
3109+
class loop *inner_loop = curr_bb->loop_father;
3110+
return get_coldest_out_loop (l, inner_loop, curr_bb);
3111+
}
3112+
30263113

30273114
/* Returns true if we can perform store motion of REF from LOOP. */
30283115

@@ -3077,6 +3164,12 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref)
30773164
if (!ref_indep_loop_p (loop, ref, sm_war))
30783165
return false;
30793166

3167+
/* Verify whether the candidate is hot for LOOP. Only do store motion if the
3168+
candidate's profile count is hot. Statement in cold BB shouldn't be moved
3169+
out of it's loop_father. */
3170+
if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop)))
3171+
return false;
3172+
30803173
return true;
30813174
}
30823175

@@ -3289,6 +3382,48 @@ fill_always_executed_in (void)
32893382
fill_always_executed_in_1 (loop, contains_call);
32903383
}
32913384

3385+
/* Find the coldest loop preheader for LOOP, also find the nearest hotter loop
3386+
to LOOP. Then recursively iterate each inner loop. */
3387+
3388+
void
3389+
fill_coldest_and_hotter_out_loop (class loop *coldest_loop,
3390+
class loop *hotter_loop, class loop *loop)
3391+
{
3392+
if (bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
3393+
coldest_loop))
3394+
coldest_loop = loop;
3395+
3396+
coldest_outermost_loop[loop->num] = coldest_loop;
3397+
3398+
hotter_than_inner_loop[loop->num] = NULL;
3399+
class loop *outer_loop = loop_outer (loop);
3400+
if (hotter_loop
3401+
&& bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
3402+
hotter_loop))
3403+
hotter_than_inner_loop[loop->num] = hotter_loop;
3404+
3405+
if (outer_loop && outer_loop != current_loops->tree_root
3406+
&& bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
3407+
outer_loop))
3408+
hotter_than_inner_loop[loop->num] = outer_loop;
3409+
3410+
if (dump_enabled_p ())
3411+
{
3412+
dump_printf (MSG_NOTE, "loop %d's coldest_outermost_loop is %d, ",
3413+
loop->num, coldest_loop->num);
3414+
if (hotter_than_inner_loop[loop->num])
3415+
dump_printf (MSG_NOTE, "hotter_than_inner_loop is %d\n",
3416+
hotter_than_inner_loop[loop->num]->num);
3417+
else
3418+
dump_printf (MSG_NOTE, "hotter_than_inner_loop is NULL\n");
3419+
}
3420+
3421+
class loop *inner_loop;
3422+
for (inner_loop = loop->inner; inner_loop; inner_loop = inner_loop->next)
3423+
fill_coldest_and_hotter_out_loop (coldest_loop,
3424+
hotter_than_inner_loop[loop->num],
3425+
inner_loop);
3426+
}
32923427

32933428
/* Compute the global information needed by the loop invariant motion pass. */
32943429

@@ -3373,6 +3508,9 @@ tree_ssa_lim_finalize (void)
33733508
free_affine_expand_cache (&memory_accesses.ttae_cache);
33743509

33753510
free (bb_loop_postorder);
3511+
3512+
coldest_outermost_loop.release ();
3513+
hotter_than_inner_loop.release ();
33763514
}
33773515

33783516
/* Moves invariants from loops. Only "expensive" invariants are moved out --
@@ -3392,6 +3530,16 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion)
33923530
/* Fills ALWAYS_EXECUTED_IN information for basic blocks. */
33933531
fill_always_executed_in ();
33943532

3533+
/* Pre-compute coldest outermost loop and nearest hotter loop of each loop.
3534+
*/
3535+
class loop *loop;
3536+
coldest_outermost_loop.create (number_of_loops (cfun));
3537+
coldest_outermost_loop.safe_grow_cleared (number_of_loops (cfun));
3538+
hotter_than_inner_loop.create (number_of_loops (cfun));
3539+
hotter_than_inner_loop.safe_grow_cleared (number_of_loops (cfun));
3540+
for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next)
3541+
fill_coldest_and_hotter_out_loop (loop, NULL, loop);
3542+
33953543
int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
33963544
int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
33973545

0 commit comments

Comments
 (0)