Skip to content

Conversation

hzongaro
Copy link
Member

@hzongaro hzongaro commented Oct 16, 2025

The heapifyForColdBlocks method of Escape Analysis creates a local temporary that it uses to hold a reference to a stack allocated object at the point of the stack allocation. If the object might need to be heapified, it generates IL that tests whether that temporary still holds a reference to the stack allocated object, and if it does, it performs the heapification and updates the temporary. However, the temporary was not being initialized on entry to the method.

Fixed this by generating IL that zero initializes the temporary variable at method entry.

During Code Generation J9::CodeGenerator::createStackAtlas will ensure that any autos that might contain references on entry to the method are zero initialized. Despite that, it's better practice to ensure that such initialization is explicitly performed explicitly in the IL, as it allows, for instance, other analyses to rely on all the definitions seen in the IL for a particular auto as enumerating the possible values the auto can hold at particular points without having to consider the possibility that the auto might hold some as yet to be determined default value that it will acquire during Code Generation.

In the spirit of upholding that better practice, this changes heapifyForColdBlocks to generate IL at method entry that zero initializes the temporary variable.

@hzongaro
Copy link
Member Author

@vijaysun-omr, may I ask you to review this pull request?

The problem can be observed with a test like the following:

public class Foo {
    public static class MyCl {
        int i;
        MyCl(int i) { this.i = i; }
    }

    public static MyCl esc;
    public static class MyException extends RuntimeException {
    }
    public static MyException excp = new MyException();

    public static final void sub2(MyCl obj) {
        Foo.esc = obj;
    }

    public static final void sub1(MyCl obj2, int i) {
        MyCl obj = null;
        try {
            if (i < 10) {
                obj = new MyCl(i);
            } else {
                obj = obj2;
            }

            if (i > 100) throw excp;
        } catch (MyException me) {
            sub2(obj);
        }
    }

    public static final void main(String[] args) {
        MyCl obj = new MyCl(1);
        for (int i = 0; i < 1_000_000_000; i++) {
            sub1(obj, i);
        }
    }
}

Compiled with the following options, Escape Analysis stack allocates the instance of MyCl in sub2, generating heapfication code for the cold block escape in the catch block:

java -XX:-EnableHCR -Xjit:{Foo.sub*}\(traceEscapeAnalysis,optDetails,log=foo.sub.ea.log,optLevel=scorching,dontInline={Foo.sub*}\) Foo

In the IL that's generated by Escape Analysis, #444 is used to hold a reference to the stack allocated object that is tested at the point at which the object might escape. But if the potential escape point is reached without the stack allocation having happened, #444 could still contain a reference to a stack allocated instance from a previous invocation of the method, which would result in an unnecessary heapification.

Entry into method following Escape Analysis
n13n      BBStart <block_2> (freq 9971)                                                       [0x7fb3380a1b90] bci=[-1,2,19] rc=0 vc=156 vn=30 li=- udi=- nc=0
n180n     astore  <temp slot 6>[#439  Auto] [flags 0x7 0x0 ]                                  [0x7fb33814f800] bci=[-1,8,20] rc=0 vc=156 vn=3 li=5 udi=1 nc=1
n179n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7fb33814f7b0] bci=[-1,8,20] rc=1 vc=156 vn=3 li=- udi=- nc=0 flg=0x302
n170n     astore  <temp slot 3>[#438  Auto] [flags 0x20000007 0x0 ]                           [0x7fb33814f4e0] bci=[-1,8,20] rc=0 vc=156 vn=3 li=6 udi=2 nc=1
n169n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7fb33814f490] bci=[-1,8,20] rc=1 vc=156 vn=3 li=- udi=- nc=0 flg=0x302
n133n     astorei  <vft-symbol>[#342  Shadow] [flags 0x18607 0x0 ]                            [0x7fb33814e950] bci=[-1,8,20] rc=0 vc=156 vn=6 li=- udi=- nc=2
n129n       loadaddr  <temp slot 7>[#440  Auto] [flags 0x60010008 0x0 ] (highWordZero Unsigned X!=0 X>=0 cannotOverflow nodePointsToNonNull cannotTrackLocalUses escapesInColdBlock )  [0x7fb33814e810] bci=[-1,8,20] rc=2 vc=156 vn=7 li=- udi=28 nc=0 flg=0xd104
n132n       aiadd                                                                             [0x7fb33814e900] bci=[-1,8,20] rc=1 vc=156 vn=6 li=- udi=- nc=2
n130n         loadaddr  Foo$MyCl[#422  Static] [flags 0x18307 0x0 ]                           [0x7fb33814e860] bci=[-1,8,20] rc=1 vc=156 vn=5 li=- udi=- nc=0
n131n         iconst 0 (X==0 X>=0 X<=0 )                                                      [0x7fb33814e8b0] bci=[-1,8,20] rc=1 vc=156 vn=4 li=- udi=- nc=0 flg=0x302
...

Stack allocation code
n19n      BBStart <block_4> (freq 117)                                                        [0x7fb3380a1d70] bci=[1,0,39] rc=0 vc=156 vn=33 li=- udi=- nc=0
n24n      treetop                                                                             [0x7fb3380a1f00] bci=[-1,8,20] rc=0 vc=156 vn=34 li=- udi=- nc=1
n23n        loadaddr  <temp slot 7>[#440  Auto] [flags 0x60010008 0x0 ] (highWordZero Unsigned X!=0 X>=0 cannotOverflow nodePointsToNonNull cannotTrackLocalUses escapesInColdBlock )  [0x7fb3380a1eb0] bci=[-1,8,20] rc=7 vc=156 vn=7 li=- udi=29 nc=0 flg=0xd104
n144n     astore  <temp slot 8>[#444  Auto] [flags 0x7 0x0 ]                                  [0x7fb33814ecc0] bci=[-1,8,20] rc=0 vc=156 vn=7 li=8 udi=4 nc=1
n23n        ==>loadaddr
n139n     istorei  <generic int shadow>[#442  Shadow +4] [flags 0x603 0x0 ]                   [0x7fb33814eb30] bci=[-1,8,20] rc=0 vc=156 vn=4 li=- udi=- nc=2
n23n        ==>loadaddr
n138n       iconst 0                                                                          [0x7fb33814eae0] bci=[-1,8,20] rc=1 vc=156 vn=4 li=- udi=- nc=0
n141n     istorei  Foo$MyCl.i I[#437  Shadow +8] [flags 0x603 0x0 ]                           [0x7fb33814ebd0] bci=[-1,8,20] rc=0 vc=156 vn=4 li=- udi=- nc=2
n23n        ==>loadaddr
n140n       iconst 0                                                                          [0x7fb33814eb80] bci=[-1,8,20] rc=1 vc=156 vn=4 li=- udi=- nc=0
n143n     istorei  <generic int shadow>[#443  Shadow +12] [flags 0x603 0x0 ]                  [0x7fb33814ec70] bci=[-1,8,20] rc=0 vc=156 vn=4 li=- udi=- nc=2
n23n        ==>loadaddr
n142n       iconst 0                                                                          [0x7fb33814ec20] bci=[-1,8,20] rc=1 vc=156 vn=4 li=- udi=- nc=0
n99n      astore  <temp slot 3>[#438  Auto] [flags 0x20000007 0x0 ] (X!=0 privatizedInlinerArg )  [0x7fb33814deb0] bci=[-1,8,20] rc=0 vc=156 vn=7 li=9 udi=5 nc=1 flg=0x2004
n23n        ==>loadaddr
n103n     astore  <temp slot 6>[#439  Auto] [flags 0x7 0x0 ] (X!=0 )                          [0x7fb33814dff0] bci=[-1,8,20] rc=0 vc=156 vn=7 li=10 udi=6 nc=1 flg=0x4

Cold block heapification code
n154n     BBStart <block_16> (freq 6)                                                         [0x7fb33814efe0] bci=[-1,8,20] rc=0 vc=156 vn=48 li=- udi=- nc=0
n153n     ifacmpne --> block_14 BBStart at n119n ()                                           [0x7fb33814ef90] bci=[-1,8,20] rc=0 vc=156 vn=49 li=- udi=- nc=2 flg=0x20
n151n       aload  <temp slot 8>[#444  Auto] [flags 0x7 0x0 ]                                 [0x7fb33814eef0] bci=[-1,8,20] rc=1 vc=156 vn=13 li=20 udi=21 nc=0
n152n       loadaddr  <temp slot 7>[#440  Auto] [flags 0x60010008 0x0 ] (highWordZero Unsigned X!=0 X>=0 cannotOverflow nodePointsToNonNull cannotTrackLocalUses escapesInColdBlock )  [0x7fb33814ef40] bci=[-1,8,20] rc=1 vc=156 vn=7 li=- udi=30 nc=0 flg=0xd104
n155n     BBEnd </block_16> =====                                                             [0x7fb33814f030] bci=[-1,8,20] rc=0 vc=156 vn=50 li=- udi=- nc=0

n149n     BBStart <block_15> (freq 6)                                                         [0x7fb33814ee50] bci=[-1,8,20] rc=0 vc=156 vn=51 li=- udi=- nc=0
n147n     treetop                                                                             [0x7fb33814edb0] bci=[-1,8,20] rc=0 vc=156 vn=53 li=- udi=- nc=1
n145n       new  jitNewObject[#92  helper Method] [flags 0x400 0x0 ] (highWordZero Unsigned X!=0 X>=0 cannotOverflow allocationCanBeRemoved HeapificationAlloc )  [0x7fb33814ed10] bci=[-1,8,20] rc=5 vc=156 vn=14 li=- udi=- nc=1 flg=0x5104
n146n         loadaddr  Foo$MyCl[#422  Static] [flags 0x18307 0x0 ]                           [0x7fb33814ed60] bci=[-1,8,20] rc=1 vc=156 vn=5 li=- udi=- nc=0
n148n     astore  <temp slot 8>[#444  Auto] [flags 0x7 0x0 ]                                  [0x7fb33814ee00] bci=[-1,8,20] rc=0 vc=156 vn=14 li=13 udi=9 nc=1
n145n       ==>new

The fix ensures #444 would be initialized with a null reference at method entry, preventing any unnecessary heapification.

@jdmpapin, FYI. Thank you for discovering this problem!

@hzongaro hzongaro requested a review from vijaysun-omr October 16, 2025 14:30
@vijaysun-omr
Copy link
Contributor

I thought live variables for GC would track uninitialized reference local variable use (load) and zero initialize such local variables in the prologue. Is this not happening here OR did we feel that we should handle it explicitly in the IL trees via a store anyway ?

@jdmpapin
Copy link
Contributor

jdmpapin commented Oct 16, 2025

I thought live variables for GC would track uninitialized reference local variable use (load) and zero initialize such local variables in the prologue.

I thought so too, but I recently looked for that zero-initialization in another case where I saw an uninitialized auto (which led me to open #22754), and I didn't find it there, though somehow I did always seem to get an NPE from the interpreter, which suggests that at least with my setup it was always zero anyway. Maybe I just missed where it happens.

My contention though is that even if we have that kind of a safety net, we should never load from an auto that has not been (explicitly) defined. I'm not sure that we should require that fact to be syntactically "obvious," e.g. demonstrable by a standard liveness analysis, but at least dynamically when we load at runtime, I think the auto should have to have been defined by that point. It's just harder to analyze code that might implicitly use some default value from the entry, and I don't think there's any strong reason to generate IL that does so, or to transform IL in a way that makes it do so. If we want the auto to have a particular value on entry to the method, then it's very straightforward to simply generate a store.

As an example of the difficulty in analysis, soon I should be contributing a new opt pass to OMR that reasons flow-insensitively about the possible values of autos, and it does this simply by considering all definitions. If we're allowed to load from an uninitialized (or... not explicitly initialized) auto, then any use of an auto might be doing that, and my new pass would need to do a dataflow analysis just to rule it out. (It was an assertion in this new pass that led to the discovery that the temp created by EA could be used uninitialized.)

Another example is that if an auto has exactly one definition, then (IMO) we should be able to reason that all uses see the value from that definition. This might be implicitly assumed in some parts of the compiler - at least, I don't know that it isn't, and I doubt that anybody else does either.

Finally, liveness for GC only takes / would take care of address-typed autos, since that's all we have to worry about for GC purposes. But I think we shouldn't load from uninitialized autos of other types either, and IMO it would be inconsistent to say that autos must be defined before use, but the address-typed ones are OK because we happen to have a GC interaction.

@jdmpapin
Copy link
Contributor

I don't think there's any strong reason to generate IL that does so, or to transform IL in a way that makes it do so

I do think a case could be made that the load should be allowed but that the value must be ultimately irrelevant (in some sufficiently precise way). I believe that would still allow for the types of analysis that I've argued for, but it would require us to introduce a notion something like LLVM's poison.

@vijaysun-omr
Copy link
Contributor

I'll think about the arguments that Devin is making about how things ought to be, but I wanted to pop out the place that I believe the initialization would get driven by :

// See if any collected reference locals are live at the start of the block.

sets the property on autos and that gets consulted in the code generator.

@vijaysun-omr
Copy link
Contributor

I agree with all of Devin's reasons, especially the ones about complicating analyses (including the one he's about to add) and consistency.

I don't believe there was a deep reasoning behind not having the store in this particular case. It was just something that was going to be implicitly handled by the codegen anyway, but it should be (mostly) equivalent from a performance standpoint to expose the store in the IL trees and it has several other benefits.

I will review this change with that understanding

@vijaysun-omr
Copy link
Contributor

Jenkins test sanity all jdk21

@vijaysun-omr
Copy link
Contributor

@hzongaro I wonder if the commit message and PR message needs to be changed to omit talking about uninitialized value being seen by GC, and mentioning the motivation being more of a cleanup as discussed for the above reasons.

@jdmpapin
Copy link
Contributor

jdmpapin commented Oct 17, 2025

I wanted to pop out the place that I believe the initialization would get driven by

To elaborate a little bit, J9::CodeGenerator::createStackAtlas() consults isInitializedReference() to identify uninitialized reference-typed autos here:

// Map uninitialized reference locals that are not stack allocated objects

It calls setNumberOfSlotsToBeInitialized() on the stack atlas:

atlas->setNumberOfSlotsToBeInitialized(numberOfSlotsToBeInitialized);

Then the number of slots to be initialized is checked in J9::$ARCH::PrivateLinkage::createPrologue(), e.g.

int32_t numReferenceLocalSlotsToInitialize = atlas->getNumberOfSlotsToBeInitialized();

I was able to find the zero-initialization in a log from Henry's test case with disableGRA. In the other case I was looking at, I must have just missed that part of prologue somehow

@hzongaro
Copy link
Member Author

I wonder if the commit message and PR message needs to be changed to omit talking about uninitialized value being seen by GC, and mentioning the motivation being more of a cleanup as discussed for the above reasons.

Yes, I will adjust that.

The heapifyForColdBlocks method of Escape Analysis creates a local
temporary that it uses to hold a reference to a stack allocated object
at the point of the stack allocation.  If the object might need to be
heapified, it generates IL that tests whether that temporary still holds
a reference to the stack allocated object, and if it does, it performs
the heapification and updates the temporary.  However, the temporary was
not explicitly initialized on entry to the method.

During Code Generation J9::CodeGenerator::createStackAtlas will ensure
that any autos that might contain references on entry to the method are
zero initialized.  Despite that, it's better practice to ensure that
such initialization is explicitly performed explicitly in the IL, as it
allows, for instance, other analyses to rely on all the definitions seen
in the IL for a particular auto as enumerating the possible values the
auto can hold at particular points without having to consider the
possibilty that the auto might hold some as yet to be determined default
value that it will acquire during Code Generation.

In the spirit of upholding that better practice, this changes
heapifyForColdBlocks to generate IL at method entry that zero
initializes the temporary variable.

Signed-off-by:  Henry Zongaro <[email protected]>
@hzongaro hzongaro force-pushed the zero-init-heapify-symref branch from 0ec1cf2 to fa0f72d Compare October 17, 2025 16:46
@hzongaro
Copy link
Member Author

I wonder if the commit message and PR message needs to be changed to omit talking about uninitialized value being seen by GC, and mentioning the motivation being more of a cleanup as discussed for the above reasons.

Done!

@hzongaro
Copy link
Member Author

Jenkins test sanity all jdk21

@vijaysun-omr
Copy link
Contributor

Thanks for the commit message change. Could you please also edit the opening comment in this PR that still refers to the original problem description ?

@hzongaro
Copy link
Member Author

Could you please also edit the opening comment in this PR that still refers to the original problem description ?

Sorry for missing that! Done.

@vijaysun-omr
Copy link
Contributor

Jenkins test sanity zlinux jdk21

@vijaysun-omr
Copy link
Contributor

I am going to merge this since tests have passed and this change is fine on its own.

However, I was wondering if we could have a similar issue with zero initialization happening in the code generator for some other auto that "may" point to the stack allocated object, and is part of the series of tests we lay down before heapification happens, i.e. to update all the autos that are in fact pointing at the stack allocated object to point instead at the heapified object. I thought of this potentially being an issue because we don't check for whether a store "covers" the cold escape point or not, before adding a test for whether the auto is pointing at the stack allocated object.

@vijaysun-omr vijaysun-omr merged commit 0e7cb49 into eclipse-openj9:master Oct 19, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants