Bytecode interpreters commonly employ a variety of optimizations to achieve better performance. This section discusses how to employ these optimizations in Bytecode DSL interpreters.
A major source of overhead in interpreted code (for both Truffle AST and bytecode interpreters) is boxing. By default, values are passed between operations as objects, which forces primitive values to be boxed up. Often, the boxed value is subsequently unboxed when it gets consumed.
Boxing elimination avoids these unnecessary boxing steps. The interpreter can speculatively rewrite bytecode instructions to specialized instructions that pass primitive values whenever possible. Boxing elimination can also improve compiled performance, because Graal is not always able to remove box-unbox sequences during compilation.
To enable boxing elimination, specify a set of boxingEliminationTypes
on the @GenerateBytecode
annotation. For example, the following configuration
@GenerateBytecode(
...
boxingEliminationTypes = {int.class, long.class}
)
will instruct the interpreter to automatically avoid boxing for int
and long
values. (Note that boolean
boxing elimination is supported, but is generally not worth the overhead of the additional instructions it produces.)
Boxing elimination is implemented using quickening, which is described below.
Quickening is a general technique to rewrite an instruction with a specialized version that (typically) requires less work. The Bytecode DSL supports quickened operations, which handle a subset of the specializations defined by an operation.
Quickened operations can be introduced to reduce the work required to evaluate an operation.
For example, a quickened operation that only accepts int
inputs might avoid operand boxing and the additional type checks required by the general operation.
Additionally, a custom operation that has only one active specialization could be quickened to an operation that only supports that single specialization, avoiding extra specialization state checks.
At the moment, quickened instructions can only be specified manually using @ForceQuickening
.
In the future, tracing will be able to automatically infer useful quickenings.
Note: Superinstructions are not yet supported.
Superinstructions combine common sequences of instructions together into single instructions. Using superinstructions can reduce the overhead of instruction dispatch, and it can enable the host compiler to perform optimizations across the instructions (e.g., eliding a stack push for a value that is subsequently popped).
In the future, tracing will be able to automatically infer useful superinstructions.
Note: Tracing is not yet supported.
Determining which instructions are worth optimizing (via quickening or superinstructions) typically requires manual profiling and benchmarking. In the future, the Bytecode DSL will automatically infer optimization opportunities by tracing the execution of a representative corpus of code.