Running C code on the Java stack causes crashes on Windows 11

We have had multiple cases recently for crashes on Windows 11. It looks like there may have been a change in Windows behaviour in a recent update that is exposing the problem, but I think the fundamental issue is with our code.

The crashes occur in ntdll!_chkstk which attempts to read from every 4k page starting at the value held for the stack in the thread's TEB and finishing at the current value of rsp less a value passed in rax. From what I can gather the purpose of this function is two-fold - if the new value of (rsp-rax) is still within the threads permitted total stack size a failed read (from hitting a guard page) triggers an increase in the currently allocated stack size, while if (rsp-rax) is outside of the permitted stack area it results in the program being terminated with error code c00000fd (which can be seen either in the Windows event log or by loading the core dump into windbg). In our case the value in rsp is that of the Java stack pointer, which is typically very far from the C stack and thus is guaranteed to cause a crash if tested by _chkstk.

I'm currently working three cases with this symptom, two of which are now well characterized and the third is waiting for data.

For the first case the call stack is
```
Child-SP          RetAddr               Call Site
00 00000000`00fac508 00007fff`84f162dd     ntdll!_chkstk+0x37
01 00000000`00fac520 00007fff`84ec2eda     ntdll!RtlpWalkFrameChain+0x13d
02 00000000`00fac670 00007fff`84ec2e52     ntdll!RtlWalkFrameChain+0x2a
03 00000000`00fac6a0 00007fff`84ee49ea     ntdll!RtlCaptureStackBackTrace+0x42
04 00000000`00fac6d0 00007fff`84ed9c22     ntdll!RtlStdLogStackTrace+0x4a
05 00000000`00fac810 00007fff`84ed7c79     ntdll!RtlpAddDebugInfoToCriticalSection+0x132
06 00000000`00fac880 00007fff`1582485f     ntdll!RtlInitializeCriticalSection+0x69
07 (Inline Function) --------`--------     j9thr29!monitor_allocate+0x6c [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\omr\thread\common\omrthread.c @ 3576] 
08 00000000`00fac8f0 00007fff`15822528     j9thr29!monitor_alloc_and_init+0x9f [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\omr\thread\common\omrthread.c @ 3789] 
09 00000000`00fac930 00007ffe`ee88b787     j9thr29!omrthread_monitor_init_with_name+0x18 [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\omr\thread\common\omrthread.c @ 3413] 
0a 00000000`00fac970 00007ffe`ee8372e6     j9vm29!monitorTableAt+0x247 [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\vm\montable.c @ 289] 
0b (Inline Function) --------`--------     j9vm29!VM_ObjectMonitor::inlineGetLockAddress+0x2a [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\oti\ObjectMonitor.hpp @ 82] 
0c 00000000`00faca90 00007ffe`e560ec0d     j9vm29!objectMonitorEnterNonBlocking+0x46 [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\vm\ObjectMonitor.cpp @ 334] 
0d (Inline Function) --------`--------     j9jit29!fast_jitMonitorEnterImpl+0x13 [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\codert_vm\cnathelp.cpp @ 1696] 
0e 00000000`00facb00 00007ffe`e55f063b     j9jit29!fast_jitMonitorEntry+0x1d [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\codert_vm\cnathelp.cpp @ 3742] 
0f 00000000`00facb30 00000000`ffe71178     j9jit29!jitMonitorEntry+0xb [c:\temp\bld_100666\bld_win_x86-64_cmprssptrs\codert_vm\xnathelp.asm @ 2209] 

```
I confirmed that jitMonitorEntry was called from {java/util/jar/JarVerifier.processEntry} +4517 and the entire call stack is operating on the Java stack.

For the second case the call stack is:
```
 # Child-SP          RetAddr               Call Site
00 00000000`004ebd28 00007ffc`3d61811d     ntdll!_chkstk+0x37
01 00000000`004ebd40 00007ffc`3d6301ba     ntdll!RtlpWalkFrameChain+0x13d
02 00000000`004ebe90 00007ffc`3a28fe87     ntdll!RtlWalkFrameChain+0x2a     
03 00000000`004ebec0 00007ffc`3a2f4179     InProcessClient64+0x8fe87       C:\Program Files\SentinelOne...
04 00000000`004ec450 00007ffc`3a290cfc     InProcessClient64+0xf4179
05 00000000`004ec4b0 00007ffc`3a290d39     InProcessClient64+0x90cfc
06 00000000`004ec550 00007ffc`3a290bc2     InProcessClient64+0x90d39
07 00000000`004ec580 00007ffc`3d61c2f4     InProcessClient64+0x90bc2
08 00000000`004ec670 00007ffc`03b353cc     ntdll!RtlLeaveCriticalSection+0x2f4
09 00000000`004ec6e0 00007ffb`b7c6b569     j9thr29!monitor_exit+0x73c [c:\workspace\openjdk-build\workspace\build\src\omr\thread\common\omrthread.c @ 4366] 
0a 00000000`004ec760 00007ffb`b787b8fc     j9vm29!objectMonitorExit+0x819 [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\vm\monhelpers.c @ 241] 
0b (Inline Function) --------`--------     j9jit29!fast_jitMonitorExitImpl+0x32 [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\codert_vm\cnathelp.cpp @ 1837] 
0c 00000000`004ec7e0 00007ffb`b785c3fb     j9jit29!fast_jitMethodMonitorExit+0x3c [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\codert_vm\cnathelp.cpp @ 3741] 
0d 00000000`004ec810 00000000`0037ea00     j9jit29!jitMethodMonitorExit+0xb [c:\workspace\openjdk-build\workspace\build\src\build\windows-x86_64-server-release\vm\runtime\codert_vm\xnathelp.s @ 2270] 

```
I confirmed that jitMethodMonitorExit was called from {org/eclipse/osgi/framework/eventmgr/EventManager$EventThread.getNextEvent} +840 and the call stack is running on the Java SP. However, the stack region for this thread has been overrun during this call sequence, presumably risking data corruption even if we had not crashed in chkstk.

Of note, the crashes in chkstk seem to be dependent on an additional factor required to trigger them. In the first case the problem was demonstrated by enabling user mode stack trace via gflags. In the second case the presence of the SentinelOne security software has triggered the failure.

For the first case there is also a demonstrated interaction with Windows Control Flow Guard - it is possible to prevent the crashes by explicitly forcing CFG off for the java executable. It is not clear why this should be the case, and it's possible that the underlying Windows change was not intentional.

These last points not withstanding, it seems untenable to risk allowing calls to Windows system functions to occur on the Java stack when we can't tell how much stack space they will require, nor guarantee that chkstk won't be called.

Update: I now have data for the third case. It's another triggered by SentinelOne during a call to jitMethodMonitorExit. The calling jit'd method was different in this case, being {com/ibm/mq/pcf/event/PCFMonitorAgent.refresh}
```
 # Child-SP          RetAddr               Call Site
00 00000000`01731788 00007ffc`ec5962ad     ntdll!_chkstk+0x37
01 00000000`017317a0 00007ffc`ec542eda     ntdll!RtlpWalkFrameChain+0x13d
02 00000000`017318f0 00007ffc`e91a39a6     ntdll!RtlWalkFrameChain+0x2a
03 00000000`01731920 00007ffc`e9215a49     InProcessClient64+0x939a6        from C:\Program Files\SentinelOne...
04 00000000`01731f20 00007ffc`e91a353b     InProcessClient64+0x105a49
05 00000000`01731f80 00007ffc`e91a48a3     InProcessClient64+0x9353b
06 00000000`01731ff0 00007ffc`e91a48d8     InProcessClient64+0x948a3
07 00000000`01732030 00007ffc`e91a47c3     InProcessClient64+0x948d8
08 00000000`01732070 00007ffc`ec59a484     InProcessClient64+0x947c3
09 00000000`01732130 00007ffc`c00c5307     ntdll!RtlLeaveCriticalSection+0x2f4
0a 00000000`017321a0 00007ffc`3846a7d9     j9thr29!monitor_exit+0x707 [c:\workspace\openjdk-build\workspace\build\src\omr\thread\common\omrthread.c @ 4366] 
0b 00000000`01732210 00007ffc`36ef4c88     j9vm29!objectMonitorExit+0x7b9 [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\vm\monhelpers.c @ 223] 
0c (Inline Function) --------`--------     j9jit29!fast_jitMonitorExitImpl+0x2e [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\codert_vm\cnathelp.cpp @ 1817] 
0d 00000000`01732290 00007ffc`36ed5e6b     j9jit29!fast_jitMethodMonitorExit+0x38 [c:\workspace\openjdk-build\workspace\build\src\openj9\runtime\codert_vm\cnathelp.cpp @ 3711] 
0e 00000000`017322c0 00000000`01239e00     j9jit29!jitMethodMonitorExit+0xb [c:\workspace\openjdk-build\workspace\build\src\build\windows-x86_64-server-release\vm\runtime\codert_vm\xnathelp.s @ 2270] 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running C code on the Java stack causes crashes on Windows 11 #22687

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running C code on the Java stack causes crashes on Windows 11 #22687

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions