Analyze and improve the performance of inline compilation #9237

Akirathan · 2024-03-01T12:52:41Z

Related to:

Benchmark added in #9158.

The benchmark org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression (run with sbt:runtime-benchmarks> benchOnly org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression) measures the performance of Compiler.runInline. As this method is used frequently by various instruments in the IDE, we would like to focus on it.

This benchmark is a bit suspicious (unstable), as the very first sets of warmup iterations run faster than all the subsequent ones. Output of running the benchmark locally:

[info] running (fork) org.enso.interpreter.bench.benchmarks.RuntimeBenchmarksRunner org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression
[info] # JMH version: 1.36
[info] # VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-jvmci-23.1-b30
[info] # VM invoker: /home/pavel/.sdkman/candidates/java/21.0.2-graalce/bin/java
[info] # VM options: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:-UnlockExperimentalVMOptions -Dslf4j.provider=org.slf4j.nop.NOPServiceProvider -Dbench.compileOnly=false --module-path=/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/nativeimage/23.1.2/nativeimage-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/word/23.1.2/word-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/jniutils/23.1.2/jniutils-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/sdk/collections/23.1.2/collections-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/polyglot/polyglot/23.1.2/polyglot-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-api/23.1.2/truffle-api-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-runtime/23.1.2/truffle-runtime-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-compiler/23.1.2/truffle-compiler-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/js/js-language/23.1.2/js-language-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/regex/regex/23.1.2/regex-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/shadowed/icu4j/23.1.2/icu4j-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/python/python-language/23.1.2/python-language-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/python/python-resources/23.1.2/python-resources-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcutil-jdk18on/1.76/bcutil-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcpkix-jdk18on/1.76/bcpkix-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk18on/1.76/bcprov-jdk18on-1.76.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/llvm/llvm-api/23.1.2/llvm-api-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-nfi/23.1.2/truffle-nfi-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/truffle/truffle-nfi-libffi/23.1.2/truffle-nfi-libffi-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/tools/profiler-tool/23.1.2/profiler-tool-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/graalvm/shadowed/json/23.1.2/json-23.1.2.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/tukaani/xz/1.9/xz-1.9.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar:/home/pavel/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-nop/2.0.9/slf4j-nop-2.0.9.jar:/home/pavel/dev/enso/runtime.jar --add-modules=org.enso.runtime --add-exports=org.slf4j.nop/org.slf4j.nop=org.slf4j
[info] # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
[info] # Warmup: 6 iterations, 10 s each
[info] # Measurement: 4 iterations, 10 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Average time, time/op
[info] # Benchmark: org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression
[info] # Run progress: 0.00% complete, ETA 00:01:40
[info] # Fork: 1 of 1
[info] OpenJDK 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
[error] SLF4J: Attempting to load provider "org.slf4j.nop.NOPServiceProvider" specified via "slf4j.provider" system property
[info] # Warmup Iteration   1: 42.978 ms/op
[info] # Warmup Iteration   2: 99.519 ms/op
[info] # Warmup Iteration   3: 134.974 ms/op
[info] # Warmup Iteration   4: 161.004 ms/op
[info] # Warmup Iteration   5: 189.944 ms/op
[info] # Warmup Iteration   6: 218.136 ms/op
[info] Iteration   1: 225.969 ms/op
[info] Iteration   2: 257.387 ms/op
[info] Iteration   3: 315.345 ms/op
[info] Iteration   4: 329.106 ms/op
[info] Result "org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression":
[info]   281.952 ±(99.9%) 313.836 ms/op [Average]
[info]   (min, avg, max) = (225.969, 281.952, 329.106), stdev = 48.566
[info]   CI (99.9%): [≈ 0, 595.788] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:43
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
[info] extra caution when trusting the results, look into the generated code to check the benchmark still
[info] works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
[info] different JVMs are already problematic, the performance difference caused by different Blackhole
[info] modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
[info] Benchmark                               Mode  Cnt    Score     Error  Units
[info] InlineCompilerBenchmark.longExpression  avgt    4  281.952 ± 313.836  ms/op
[info] Benchmark results reported into /home/pavel/dev/enso/engine/runtime-benchmarks/./bench-report.xml
[success] Total time: 105 s (01:45), completed Mar 1, 2024, 1:50:39 PM

Attaching VisualVM and CPU sampler shows that most CPU time is consumed by AliasAnalysis, as expected:

Looking into memory usage of this benchmark, there does not seem to be any major leak.

The text was updated successfully, but these errors were encountered:

hubertp · 2024-03-23T20:20:53Z

The reason for increasing timings - looks like we never reset scopes and (in short) inlineContext.localScope.get.scope.scopeCount just continues to increase. Probably due to the whole deepCopy logic which I don't fully grasp yet. This I believe affects alias analysis because we iterate over all(?) scopes every time. In other words I think it's a wrongly constructed benchmark setup.

enso-bot · 2024-03-25T08:58:51Z

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-21):

Progress: Juggling between tasks. investigating inline execution problems in benchmarks. Tried to reproduce #9475, no success so far. Merged fix for #9278. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

enso-bot · 2024-03-25T14:12:04Z

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-21):

Progress: Juggling between tasks. investigating inline execution problems in benchmarks. Tried to reproduce #9475, no success so far. Merged fix for #9278. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

enso-bot · 2024-03-25T14:12:13Z

Hubert Plociniczak reports a new STANDUP for the provided date (2024-03-22):

Progress: PR that reduces one of the significant hotspots. Number of scopes continues to rise rather drastically, potentially leading to performance issues. CI build setup meeting. It should be finished by 2024-03-22.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

enso-bot · 2024-03-26T22:48:26Z

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-03-25):

Summary: There is 5 days delay in implementation of the Analyze and improve the performance of inline compilation (#9237) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Benchmark performance problems appear to be in the benchmark itself, hence the delay. Also juggling between different tasks.

enso-bot · 2024-03-26T22:48:35Z

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-25):

Progress: No luck with finding the solution so far, will need further work while juggling between tasks. Able to reproduce #9475 locally. It should be finished by 2024-03-27.

Next Day: Next day I will be working on the #9237 task. Continue investigating benchmark issues.

enso-bot · 2024-03-28T08:16:46Z

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-27):

Progress: Reported on problems with patching for #9475. Would need more investigation if we want to go that path. Provided a fix for benchmark performance problems. It should be finished by 2024-03-27.

Next Day: Next day I will be working on the #9237 task. Address review, pick up next item on the list.

…rformance (#9520) As benchmarks show, a significant amount of time is spent traversing `Set` of `Graph.Link`s. That's unfortunate and unnecessary. We can equally keep helper maps that make search constant time. Fixed inline compilation benchmarks by properly cleaning up scopes after runs. Closes #9237. # Important Notes Things like ![Screenshot from 2024-03-22 11-23-01](https://github.com/enso-org/enso/assets/292128/7c1e220a-6e33-4396-a9b2-0e788f615323) ![Screenshot from 2024-03-22 11-13-19](https://github.com/enso-org/enso/assets/292128/0272b1cb-252e-4662-b539-174844941c8e) are all gone. There is plenty of it those are just samples. Benchmarks are back in order: ``` [info] # Warmup Iteration 1: 2.702 ms/op [info] # Warmup Iteration 2: 3.080 ms/op [info] # Warmup Iteration 3: 2.818 ms/op [info] # Warmup Iteration 4: 3.334 ms/op [info] # Warmup Iteration 5: 2.448 ms/op [info] # Warmup Iteration 6: 2.583 ms/op [info] Iteration 1: 2.908 ms/op [info] Iteration 2: 2.915 ms/op [info] Iteration 3: 2.774 ms/op [info] Iteration 4: 2.601 ms/op [info] Result "org.enso.compiler.benchmarks.inline.InlineCompilerBenchmark.longExpression": [info] 2.799 ±(99.9%) 0.953 ms/op [Average] [info] (min, avg, max) = (2.601, 2.799, 2.915), stdev = 0.148 [info] CI (99.9%): [1.846, 3.753] (assumes normal distribution) ```

enso-bot · 2024-04-02T21:10:45Z

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-04-01):

Summary: There is 4 days delay in implementation of the Analyze and improve the performance of inline compilation (#9237) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Got stuck debugging google analytics demo.

hubertp · 2024-04-02T21:12:08Z

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-04-01):

Typo in bug number. Ignore

Akirathan added -compiler --low-performance labels Mar 1, 2024

Akirathan assigned hubertp Mar 1, 2024

github-project-automation bot added this to Issues Board Mar 1, 2024

github-project-automation bot moved this to ❓New in Issues Board Mar 1, 2024

hubertp moved this from ❓New to 📤 Backlog in Issues Board Mar 21, 2024

hubertp moved this from 📤 Backlog to 🔧 Implementation in Issues Board Mar 22, 2024

hubertp mentioned this issue Mar 22, 2024

Fix inline compilation benchmarks and Improve searching Graph.Link performance #9520

Merged

2 tasks

hubertp mentioned this issue Mar 25, 2024

Make execution multithreaded to allow for faster visualization responses #9529

Closed

hubertp moved this from 🔧 Implementation to 👁️ Code review in Issues Board Mar 27, 2024

mergify bot closed this as completed in #9520 Mar 28, 2024

github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze and improve the performance of inline compilation #9237

Analyze and improve the performance of inline compilation #9237

Akirathan commented Mar 1, 2024

hubertp commented Mar 23, 2024 •

edited

Loading

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 26, 2024

enso-bot bot commented Mar 26, 2024

enso-bot bot commented Mar 28, 2024

enso-bot bot commented Apr 2, 2024

hubertp commented Apr 2, 2024

Analyze and improve the performance of inline compilation #9237

Analyze and improve the performance of inline compilation #9237

Comments

Akirathan commented Mar 1, 2024

hubertp commented Mar 23, 2024 • edited Loading

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 25, 2024

enso-bot bot commented Mar 26, 2024

enso-bot bot commented Mar 26, 2024

enso-bot bot commented Mar 28, 2024

enso-bot bot commented Apr 2, 2024

hubertp commented Apr 2, 2024

hubertp commented Mar 23, 2024 •

edited

Loading