Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowered optimization level of omrzfs.c for Open XL z/OS #7653

Open
Deigue opened this issue Feb 12, 2025 · 4 comments
Open

Lowered optimization level of omrzfs.c for Open XL z/OS #7653

Deigue opened this issue Feb 12, 2025 · 4 comments

Comments

@Deigue
Copy link
Contributor

Deigue commented Feb 12, 2025

There is run-time crash happening when compiling omrzfs.c with Open XL z/OS, eventually leading to a branch to an invalid memory address.
Differences are seen in the disassembled code of omrzfs.c comparing with XLC (existing z/OS compiler)

https://github.com/eclipse-omr/omr/blob/master/port/zos390/omrzfs.c#L121 (point of crash)

Looking at the crash dump produced with the Open XL build, around the part where the code crashes

//register context
$r5   = 0x000000500884cb70 Ptr Unknown!
$r8   = 0x00000050088800d0 Ptr Unknown!
$r14  = 0x000000001a76aeac {libj9prt29.so}{getZFSUserCacheUsed} +1044
$r15  = 0x0000000000000000 CP - {}
$addr = 0x0000000000000002 Ptr CP - {}
$bea  = 0x000000001a76aeaa {libj9prt29.so}{getZFSUserCacheUsed} +1042

//as the current address is odd, branch back via $bea to see where code came from as shown below:
0x1a76aeaa {libj9prt29.so}{getZFSUserCacheUsed} +1042 05EF         BALR      GPR14,GPR15
//Crash occurring here, as when trying to BALR to GPR15, which is empty address.

// below shows relevant assembly indicating arrival to the current value of GPR15.
0x1a76aab2 {libj9prt29.so}{getZFSUserCacheUsed} +26  B9040085     LGR       GPR8,GPR5
0x1a76ab78 {libj9prt29.so}{getZFSUserCacheUsed} +224 E3F080280004 LG        GPR15,40(,GPR8)
0x1a76ac92 {libj9prt29.so}{getZFSUserCacheUsed} +506 E3F080580004 LG        GPR15,88(,GPR8)
0x1a76acda {libj9prt29.so}{getZFSUserCacheUsed} +578 E38048E80024 STG       GPR8,2280(,GPR4)
0x1a76acea {libj9prt29.so}{getZFSUserCacheUsed} +594 05EF         BALR      GPR14,GPR15

//Using above info, 
(kca) p 0x0000005010afca00+2280 // GPR4+2280 at +578
%7 = 0x0000005010afd2e8   (343877341928)
(kca) what (0x0000005010afd2e8)
0x5010afd2e8: 0x00000050088800d0 Ptr Unknown!
(kca) p 0x00000050088800d0+88 // adding offset 88 from getZFSUserCacheUsed +506
%8 = 0x0000005008880128   (343740514600)
(kca) what (0x0000005008880128)
0x5008880128: 0x000000001a748df8 {libj9prt29.so}{???} +13600
(kca) (0x000000001a748df8)/6i  //instructions from first BALR to GPR15
0x1a748df8 {libj9prt29.so}{???} +13600 E3F000100017 LLGT      GPR15,16
0x1a748dfe {libj9prt29.so}{???} +13606 58FF0220     L         GPR15,544(GPR15,)
0x1a748e02 {libj9prt29.so}{???} +13610 58FF0018     L         GPR15,24(GPR15,)
0x1a748e06 {libj9prt29.so}{???} +13614 58FF0300     L         GPR15,768(GPR15,)
0x1a748e0a {libj9prt29.so}{???} +13618 07FF         BR        GPR15
0x1a748e0c {libj9prt29.so}{???} +13620 0000         Invalid Instruction

The above instructions modify the contents of GPR15, which was supposedly holding the address to branch back from the syscall, causing the subsequent BALR call at +1042 to fail. Happening via code ->

rc = getZFSClientCacheSize(&clientCacheSize);

Lowering the opt level to -O0 ... or alternatively disabling inlining via something like -fno-inline-functions seems to mitigate the problem.

Creating this item to track the permanent fix for wyvern/ Open XL z/OS so we can revert the conditional lowering of the opt level for this method.

@Deigue
Copy link
Contributor Author

Deigue commented Feb 12, 2025

test.zip

The crash is replicable by compiling the above unit-test with Open XL z/OS. This one simply makes two consequent calls the getZFSUserCacheUsed which holds the BPX4PCT syscall. Encountering this twice seems to trigger it, but it may be happening as a result of some inlining happening due to optimization while compiling this specifically with Open XL.

Associated workaround PR: #7639
Will be trying to limit the context and manner in which the opt is lowered for omrzfs here, to minimize how extensive the performance impact will be while the concrete fix gets developed.

A issue item has been created internally with the Open XL developers to look at this particular issue, describing the above crash in further detail.

@Deigue
Copy link
Contributor Author

Deigue commented Feb 12, 2025

@joransiu @babsingh
dbg-bpx4pct.zip

I tried with a simpler version of the test with only making direct call to BPX4PCT twice in a row, and just some regular printf indicating inbetween if I reached past it, or crashed.

compilation command:

/xlc210/usr/lpp/IBM/cnw/v2r1/openxl/bin/ibm-clang64 -O0 -D_ALL_SOURCE -D_XOPEN_SOURCE=600 -I/../omr -I/usr/include -I/usr/lpp/cbclib/include -I/.../omr/include_core -I/include_core/ -I/.../omr/port/zos390/ -m64 -o dbg-bpx4ct.c.o dbg-bpx4ct.c

Results:

- O3 compile crashes
> ./dbg-bpx4ct.c.o
call #1 BPX4PCT done ...
CEE3201S The system detected an operation exception (System Completion Code=0C1).
         From entry point main at compile unit offset +0000000000000000 at entry offset -000000001970A698 at address 0000000000000000.
         Possible Bad Branch:  Statement:   Offset: +0000017A
Illegal instruction

- O1 compile crashes.
- > ./dbg-bpx4ct.c.o
call #1 BPX4PCT done ...
CEE3201S The system detected an operation exception (System Completion Code=0C1).
         From entry point main at compile unit offset +0000000000000000 at entry offset -000000001970A6A0 at address 0000000000000000.
         Possible Bad Branch:  Statement:   Offset: +00000174
Illegal instruction

-O0 works.
> ./dbg-bpx4ct.c.o
call #1 BPX4PCT done ...
call #2 BPX4PCT done ...

@joransiu
Copy link
Contributor

Thanks @Deigue. Please also share this update with the C/C++ team on issue we have with them.

@joransiu
Copy link
Contributor

APAR PH65242 has been opened by the Open XL product team for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants