Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze and improve the performance of inline compilation #9237

Closed
Akirathan opened this issue Mar 1, 2024 · 9 comments · Fixed by #9520
Closed

Analyze and improve the performance of inline compilation #9237

Akirathan opened this issue Mar 1, 2024 · 9 comments · Fixed by #9520

Comments

@Akirathan
Copy link
Member

Related to:

Benchmark added in #9158.

The benchmark org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression (run with sbt:runtime-benchmarks> benchOnly org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression) measures the performance of Compiler.runInline. As this method is used frequently by various instruments in the IDE, we would like to focus on it.

This benchmark is a bit suspicious (unstable), as the very first sets of warmup iterations run faster than all the subsequent ones. Output of running the benchmark locally:

[info] running (fork) org.enso.interpreter.bench.benchmarks.RuntimeBenchmarksRunner org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression
[info] # JMH version: 1.36
[info] # VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-jvmci-23.1-b30
[info] # VM invoker: /home/pavel/.sdkman/candidates/java/21.0.2-graalce/bin/java
[info] # VM options: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:-UnlockExperimentalVMOptions -Dslf4j.provider=org.slf4j.nop.NOPServiceProvider -Dbench.compileOnly=false --module-path=/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/nativeimage/23.1.2/nativeimage-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/word/23.1.2/word-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/jniutils/23.1.2/jniutils-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/collections/23.1.2/collections-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/polyglot/polyglot/23.1.2/polyglot-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-api/23.1.2/truffle-api-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-runtime/23.1.2/truffle-runtime-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-compiler/23.1.2/truffle-compiler-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/js/js-language/23.1.2/js-language-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/regex/regex/23.1.2/regex-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/shadowed/icu4j/23.1.2/icu4j-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/python/python-language/23.1.2/python-language-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/python/python-resources/23.1.2/python-resources-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcutil-jdk18on/1.76/bcutil-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcpkix-jdk18on/1.76/bcpkix-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk18on/1.76/bcprov-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/llvm/llvm-api/23.1.2/llvm-api-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-nfi/23.1.2/truffle-nfi-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-nfi-libffi/23.1.2/truffle-nfi-libffi-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/tools/profiler-tool/23.1.2/profiler-tool-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/shadowed/json/23.1.2/json-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/tukaani/xz/1.9/xz-1.9.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-nop/2.0.9/slf4j-nop-2.0.9.jar:/home/pavel/dev/enso/runtime.jar --add-modules=org.enso.runtime --add-exports=org.slf4j.nop/org.slf4j.nop=org.slf4j
[info] # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
[info] # Warmup: 6 iterations, 10 s each
[info] # Measurement: 4 iterations, 10 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Average time, time/op
[info] # Benchmark: org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression
[info] # Run progress: 0.00% complete, ETA 00:01:40
[info] # Fork: 1 of 1
[info] OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
[error] SLF4J: Attempting to load provider "org.slf4j.nop.NOPServiceProvider" specified via "slf4j.provider" system property
[info] # Warmup Iteration   1: 42.978 ms/op
[info] # Warmup Iteration   2: 99.519 ms/op
[info] # Warmup Iteration   3: 134.974 ms/op
[info] # Warmup Iteration   4: 161.004 ms/op
[info] # Warmup Iteration   5: 189.944 ms/op
[info] # Warmup Iteration   6: 218.136 ms/op
[info] Iteration   1: 225.969 ms/op
[info] Iteration   2: 257.387 ms/op
[info] Iteration   3: 315.345 ms/op
[info] Iteration   4: 329.106 ms/op
[info] Result "org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression":
[info]   281.952 ±(99.9%) 313.836 ms/op [Average]
[info]   (min, avg, max) = (225.969, 281.952, 329.106), stdev = 48.566
[info]   CI (99.9%): [≈ 0, 595.788] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:43
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
[info] extra caution when trusting the results, look into the generated code to check the benchmark still
[info] works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
[info] different JVMs are already problematic, the performance difference caused by different Blackhole
[info] modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
[info] Benchmark                               Mode  Cnt    Score     Error  Units
[info] InlineCompilerBenchmark.longExpression  avgt    4  281.952 ± 313.836  ms/op
[info] Benchmark results reported into /home/pavel/dev/enso/engine/runtime-benchmarks/./bench-report.xml
[success] Total time: 105 s (01:45), completed Mar 1, 2024, 1:50:39 PM

Attaching VisualVM and CPU sampler shows that most CPU time is consumed by AliasAnalysis, as expected:
image

Looking into memory usage of this benchmark, there does not seem to be any major leak.

@hubertp
Copy link
Collaborator

hubertp commented Mar 23, 2024

The reason for increasing timings - looks like we never reset scopes and (in short) inlineContext.localScope.get.scope.scopeCount just continues to increase. Probably due to the whole deepCopy logic which I don't fully grasp yet. This I believe affects alias analysis because we iterate over all(?) scopes every time. In other words I think it's a wrongly constructed benchmark setup.

@enso-bot
Copy link

enso-bot bot commented Mar 25, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-21):

Progress: Juggling between tasks. investigating inline execution problems in benchmarks. Tried to reproduce #9475, no success so far. Merged fix for #9278. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

@enso-bot
Copy link

enso-bot bot commented Mar 25, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-21):

Progress: Juggling between tasks. investigating inline execution problems in benchmarks. Tried to reproduce #9475, no success so far. Merged fix for #9278. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

@enso-bot
Copy link

enso-bot bot commented Mar 25, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-22):

Progress: PR that reduces one of the significant hotspots. Number of scopes continues to rise rather drastically, potentially leading to performance issues. CI build setup meeting. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

@enso-bot
Copy link

enso-bot bot commented Mar 26, 2024

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-03-25):

Summary: There is 5 days delay in implementation of the Analyze and improve the performance of inline compilation (#9237) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Benchmark performance problems appear to be in the benchmark itself, hence the delay. Also juggling between different tasks.

@enso-bot
Copy link

enso-bot bot commented Mar 26, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-25):

Progress: No luck with finding the solution so far, will need further work while juggling between tasks. Able to reproduce #9475 locally. It should be finished by 2024-03-27.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

@hubertp hubertp moved this from 🔧 Implementation to 👁️ Code review in Issues Board Mar 27, 2024
@enso-bot
Copy link

enso-bot bot commented Mar 28, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-27):

Progress: Reported on problems with patching for #9475. Would need more investigation if we want to go that path. Provided a fix for benchmark performance problems. It should be finished by 2024-03-27.

Next Day: Next day I will be working on the #9237 task. Address review, pick up next item on the list.

@mergify mergify bot closed this as completed in #9520 Mar 28, 2024
mergify bot pushed a commit that referenced this issue Mar 28, 2024
…rformance (#9520)

As benchmarks show, a significant amount of time is spent traversing `Set` of `Graph.Link`s. That's unfortunate and unnecessary. We can equally keep helper maps that make search constant time.

Fixed inline compilation benchmarks by properly cleaning up scopes after runs.

Closes #9237.

# Important Notes
Things like
![Screenshot from 2024-03-22 11-23-01](https://github.com/enso-org/enso/assets/292128/7c1e220a-6e33-4396-a9b2-0e788f615323)
![Screenshot from 2024-03-22 11-13-19](https://github.com/enso-org/enso/assets/292128/0272b1cb-252e-4662-b539-174844941c8e)
are all gone. There is plenty of it those are just samples.

Benchmarks are back in order:
```
[info] # Warmup Iteration   1: 2.702 ms/op
[info] # Warmup Iteration   2: 3.080 ms/op
[info] # Warmup Iteration   3: 2.818 ms/op
[info] # Warmup Iteration   4: 3.334 ms/op
[info] # Warmup Iteration   5: 2.448 ms/op
[info] # Warmup Iteration   6: 2.583 ms/op
[info] Iteration   1: 2.908 ms/op
[info] Iteration   2: 2.915 ms/op
[info] Iteration   3: 2.774 ms/op
[info] Iteration   4: 2.601 ms/op
[info] Result "org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression":
[info]   2.799 ±(99.9%) 0.953 ms/op [Average]
[info]   (min, avg, max) = (2.601, 2.799, 2.915), stdev = 0.148
[info]   CI (99.9%): [1.846, 3.753] (assumes normal distribution)
```
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Mar 28, 2024
@enso-bot
Copy link

enso-bot bot commented Apr 2, 2024

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-04-01):

Summary: There is 4 days delay in implementation of the Analyze and improve the performance of inline compilation (#9237) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Got stuck debugging google analytics demo.

@hubertp
Copy link
Collaborator

hubertp commented Apr 2, 2024

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-04-01):

Typo in bug number. Ignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants