You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement -codegen-fuse-op-and-check true for C codegen
It appears that GCC (and, to a lesser extent) Clang/LLVM do not always
successfully fuse adjacent `Word<N>_<op>` and
`Word{S,U}<N>_<op>CheckP` primitives. The performance results
reported at MLton#273 and
MLton#292 suggest that this does not
always have significant impact, but a close look at the `md5`
benchmark shows that the native codegen significantly outperforms the
C codegen with gcc-9 due to redundant arithmetic computations (one for
`Word{S,U}<N>_<op>CheckP` and another for `Word<N>_<op>`).
(Note: Because the final md5 state is not used by the `md5` benchmark
program, MLton actually optimizes out most of the md5 computation.
What is left is a lot of arithmetic from `PackWord32Little.subVec` to
check for indices that should raise `Subscript`.)
For example, with `-codegen-fuse-op-and-check false` and gcc-9, the
`transform` function of `md5` has the following assembly:
movl %r9d, %r10d
subl $1, %r10d
jo .L650
leal -1(%r8), %r10d
movl %r10d, %r12d
addl %r10d, %edx
jo .L650
addl %r10d, %r11d
cmpl %eax, %r11d
jnb .L656
movl %ebp, %edx
addl $1, %edx
jo .L659
leal 1(%rcx), %edx
movl %edx, %r11d
imull %r9d, %r11d
jo .L650
imull %r8d, %edx
movl %edx, %r11d
addl %r10d, %r11d
jo .L650
leal (%rdx,%r10), %r11d
cmpl %eax, %r11d
jnb .L665
What seems to have happened is that gcc has arranged for equivalent
values to be in `%r8` and `%r9`. In the first three lines, there is
an implementation of `WordS32_subCheckP (X, 1)` using `subl/jo`, while
in the fourth line, there is an implementation of `Word32_sub (X, 1)`
using `lea` with an offset of `-1`. Notice that `%r10` is used for
the result of both, so the fourth line is redundant (the value is
already in `%r10`).
On the other hand, with `-codegen-fuse-op-and-check true` and gcc-9,
the `transform` function of `md5` has the following assembly:
movl %r8d, %r9d
subl $1, %r9d
jo .L645
addl %r9d, %ecx
jo .L645
cmpl %edx, %ecx
jnb .L651
movl %eax, %ecx
addl $1, %ecx
jo .L654
imull %r8d, %ecx
jo .L645
addl %r9d, %ecx
jo .L645
cmpl %edx, %ecx
jnb .L660
0 commit comments