Skip to content

[SystemZ][NewPM] huge loop expansion with opt -O1 #51488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cuviper opened this issue Oct 12, 2021 · 2 comments
Open

[SystemZ][NewPM] huge loop expansion with opt -O1 #51488

cuviper opened this issue Oct 12, 2021 · 2 comments
Labels
backend:SystemZ bugzilla Issues migrated from bugzilla

Comments

@cuviper
Copy link
Member

cuviper commented Oct 12, 2021

Bugzilla Link 52146
Version trunk
OS Linux
Attachments bitcode from rust#89609
CC @alex

Extended Description

xref: rust-lang/rust#89609

After enabling the new pass manager in rustc, one particular crate showed a large increase in compile time, but only when targeting s390x. I attached the bitcode from rustc, and I can reproduce the problem with opt -O1 on main.

$ llvm-dis <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc | wc -lL
157863 2306

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL
794414 4515800

real 2m4.580s
user 2m4.675s
sys 0m0.221s

There are two instances of that longest 4.5M line length, and both are huge phi nodes just before invoking a function ending in 17hde8a472161ebd31bE (so you can search for that). The predecessors look like a mass of loopexit.split-lp expansions gone rogue.

The same input behaves reasonably with the old PM, both optimizing faster and resulting in a smaller bitcode.

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S --enable-new-pm=0 | wc -lL
148220 2306

real 0m2.266s
user 0m2.322s
sys 0m0.024s

If you edit out the "target-cpu"="z10" attributes in the IR, this reproducer can also be forced to -mtriple x86_64, and that runs fine with new PM.

@cuviper
Copy link
Member Author

cuviper commented Oct 13, 2021

As mentioned in bug 45253 comment 11, https://reviews.llvm.org/D98481 seems to solve the problem by restricting the inliner.

$ time opt <rustc_ast_lowering-cgu.0.rcgu.thin-lto-after-patch.bc -O1 -S | wc -lL
216583 2306

real 0m4.597s
user 0m4.663s
sys 0m0.037s

Still slower than old-pm, but it's acceptable.

@cuviper
Copy link
Member Author

cuviper commented Jan 7, 2022

@nikic pointed out that SystemZ has an inlining threshold multiplier of 3, which would explain why we didn't see this on other targets. rust-lang/rust#89609 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

1 participant