Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement
-codegen-fuse-op-and-check true
for C codegen
It appears that GCC (and, to a lesser extent) Clang/LLVM do not always successfully fuse adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives. The performance results reported at MLton#273 and MLton#292 suggest that this does not always have significant impact, but a close look at the `md5` benchmark shows that the native codegen significantly outperforms the C codegen with gcc-9 due to redundant arithmetic computations (one for `Word{S,U}<N>_<op>CheckP` and another for `Word<N>_<op>`). (Note: Because the final md5 state is not used by the `md5` benchmark program, MLton actually optimizes out most of the md5 computation. What is left is a lot of arithmetic from `PackWord32Little.subVec` to check for indices that should raise `Subscript`.) For example, with `-codegen-fuse-op-and-check false` and gcc-9, the `transform` function of `md5` has the following assembly: movl %r9d, %r10d subl $1, %r10d jo .L650 leal -1(%r8), %r10d movl %r10d, %r12d addl %r10d, %edx jo .L650 addl %r10d, %r11d cmpl %eax, %r11d jnb .L656 movl %ebp, %edx addl $1, %edx jo .L659 leal 1(%rcx), %edx movl %edx, %r11d imull %r9d, %r11d jo .L650 imull %r8d, %edx movl %edx, %r11d addl %r10d, %r11d jo .L650 leal (%rdx,%r10), %r11d cmpl %eax, %r11d jnb .L665 What seems to have happened is that gcc has arranged for equivalent values to be in `%r8` and `%r9`. In the first three lines, there is an implementation of `WordS32_subCheckP (X, 1)` using `subl/jo`, while in the fourth line, there is an implementation of `Word32_sub (X, 1)` using `lea` with an offset of `-1`. Notice that `%r10` is used for the result of both, so the fourth line is redundant (the value is already in `%r10`). On the other hand, with `-codegen-fuse-op-and-check true` and gcc-9, the `transform` function of `md5` has the following assembly: movl %r8d, %r9d subl $1, %r9d jo .L645 addl %r9d, %ecx jo .L645 cmpl %edx, %ecx jnb .L651 movl %eax, %ecx addl $1, %ecx jo .L654 imull %r8d, %ecx jo .L645 addl %r9d, %ecx jo .L645 cmpl %edx, %ecx jnb .L660
- Loading branch information