You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First discovered in rust-lang/rust#49775 we've found that LLVM will promote unmodified global definitions to constant through the optimization passes. On some platforms, however, these constants may actually be modified causing the promotion to cause a page fault at runtime (modifying readonly memory).
The specific case we ran into was that on our Android configuration we've got enough flags that disable atomic instruction generation and instead lowers down to usage of the libgcc intrinsics for atomics. Namely we have a module like:
@FOO = internal unnamed_addr global <{ [4 x i8] }> zeroinitializer, align 4
define void @main() {
%a = load atomic i32, i32* bitcast (<{ [4 x i8] }>* @FOO to i32*) seq_cst, align 4
ret void
}
Where when this is optimized with /opt foo.ll -mtriple=arm-linux-androideabi -mattr=+v5te,+strict-align -o - -S -O2 it will generate:
Albeit my assembly isn't super strong but nm confirms that FOO is indeed in rodata rather than in bss like it originally would be. Unfortunately though the assembly also makes use of __sync_val_compare_and_swap_4, an intrinsic in libgcc. The intrinsic dispatches to __kuser_cmpxchg it looks like.
In our tests where we run inside the Android emulator it looks like the kernel detects that the local "hardware" actually has atomic instructions so __kuser_cmpxchg uses ldrex and strexeq. The strexeq instruction, however, caues a page fault as it can't store the value back into .rodata
Some more information is at the end of the referenced issue at rust-lang/rust#49775 (comment) but I was wondering, is this something that we should be disabling locally? Or is this an LLVM misoptimization?
The text was updated successfully, but these errors were encountered:
For the specific case of a 32-bit load on ARM, we might be able to emit some different sequence which doesn't cause a write (I think a simple load instruction followed by a call to __sync_synchronize does the right thing), but in some cases the only legal instruction sequence for an atomic load involves an instruction which causes a "write". For example, a 64-bit load on 32-bit ARM without LPAE, or a 128-bit load on x86-64 or AArch64. So we can't put atomic variables into read-only memory.
Granted, there are better transforms we should do here; specifically, we should be able to constant-fold the load.
Extended Description
First discovered in rust-lang/rust#49775 we've found that LLVM will promote unmodified
global
definitions toconstant
through the optimization passes. On some platforms, however, these constants may actually be modified causing the promotion to cause a page fault at runtime (modifying readonly memory).The specific case we ran into was that on our Android configuration we've got enough flags that disable atomic instruction generation and instead lowers down to usage of the libgcc intrinsics for atomics. Namely we have a module like:
@FOO = internal unnamed_addr global <{ [4 x i8] }> zeroinitializer, align 4
define void @main() {
%a = load atomic i32, i32* bitcast (<{ [4 x i8] }>* @FOO to i32*) seq_cst, align 4
ret void
}
Where when this is optimized with
/opt foo.ll -mtriple=arm-linux-androideabi -mattr=+v5te,+strict-align -o - -S -O2
it will generate:@FOO = internal unnamed_addr constant <{ [4 x i8] }> zeroinitializer, align 4
The assembly, however, generates:
main:
FOO:
.zero 4
.size FOO, 4
Albeit my assembly isn't super strong but
nm
confirms thatFOO
is indeed in rodata rather than in bss like it originally would be. Unfortunately though the assembly also makes use of __sync_val_compare_and_swap_4, an intrinsic in libgcc. The intrinsic dispatches to __kuser_cmpxchg it looks like.In our tests where we run inside the Android emulator it looks like the kernel detects that the local "hardware" actually has atomic instructions so __kuser_cmpxchg uses
ldrex
andstrexeq
. Thestrexeq
instruction, however, caues a page fault as it can't store the value back into.rodata
Some more information is at the end of the referenced issue at rust-lang/rust#49775 (comment) but I was wondering, is this something that we should be disabling locally? Or is this an LLVM misoptimization?
The text was updated successfully, but these errors were encountered: