Change return type of getStringUTF8Length #21005

hzongaro · 2025-01-24T14:58:03Z

The return type of the JVM's getStringUTF8Length method has changed from IDATA to UDATA. This change adjusts the JIT compiler's uses of that method. In particular, the return type of TR_FrontEnd::getStringUTF8Length and its overriding methods change from intptr_t to int32_t. Similarly, the bufferSize argument of TR_FrontEnd::getStringUTF8 becomes uintptr_t where it was int32_t.

The motivation was that the length of a UTF-8 encoded String could be greater than 2³² bytes, so the length could overflow intptr_t on a 32-bit platform. All uses of getStringUTF8Length in the JIT involve signatures, descriptors, method names and class names, which will never be large enough to exceed the range of the int32_t type. This was something that @jpmpapin was good enough to verify. Just to be cautious, however, this change includes an assertion test that the computed length is in range for the int32_t type.

This change also introduces a method, getStringUTF8UnabbreviatedLength, that returns a length value of type uint64_t if the JIT compiler ever needs to determine the UTF-8 encoded length of an arbitrary String. The method is currently unused.

In addition, String Builder Transformer uses the result of getStringUTF8Length to estimate the StringBuilder buffer size needed to accommodate appending a constant String to a StringBuilder. That could overestimate the space required. This has been changed to use getStringLength instead, to use the actual lengths of constant String objects. A test has also been added to detect integer overflow of the capacity estimate, aborting the transformation, as StringBuilder.<init>
will throw a NegativeArraySizeException if the specified capacity is negative.

This pull request requires a coordinated merge with OMR pull request eclipse-omr/omr#7620

hzongaro · 2025-01-24T14:59:03Z

@jdmpapin, thank you for your off-line review of an earlier version of these changes. May I ask you to review this updated version?

runtime/compiler/optimizer/StringBuilderTransformer.cpp

runtime/compiler/net/MessageTypes.hpp

runtime/compiler/env/VMJ9.cpp

runtime/compiler/env/VMJ9.h

runtime/compiler/env/j9method.cpp

hzongaro · 2025-02-03T14:11:33Z

Thanks for your review, @jdmpapin. I have addressed your review comments. May I ask you to rereview? Once you are OK with the changes, I will squash the extra commits.

jdmpapin

LGTM, please squash

String Builder Transformer uses the result of getStringUTF8Length to estimate the StringBuilder buffer size needed to accommodate appending a constant String to a StringBuilder. That could overestimate the space required. This has been changed to use getStringLength instead, to use the actual lengths of constant String objects. A test has also been added to detect integer overflow of the capacity estimate, aborting the transformation, as StringBuilder.<init> will throw a NegativeArraySizeException if the specified capacity is negative. Signed-off-by: Henry Zongaro <[email protected]>

The return type of the JVM's getStringUTF8Length method has changed from IDATA to UDATA. This change adjusts the JIT compiler's uses of that method. In particular, the return type of TR_FrontEnd::getStringUTF8Length and its overriding methods changes from intptr_t to int32_t. Similarly, the bufferSize argument of TR_FrontEnd::getStringUTF8 becomes uintptr_t where it was int32_t. The motivation was that the length of a UTF-8 encoded String could be greater than 2^32 bytes, so the length could overflow on a 32-bit platform. All uses of getStringUTF8Length in the JIT involve signatures, descriptors, method names and class names, which will never be large enough to exceed the range of the int32_t type. Just to be cautious, however, this change includes an assertion test that the computed length is in range for the int32_t type, allowing for a maximum length of 2^31-2. That ensures that any code that then uses that length to allocate a buffer to contain the encoded String with a NUL terminator will not overflow a 32-bit signed integer representation for the length plus the NUL byte. This change also introduces a method, getStringUTF8UnabbreviatedLength, that returns a length value of type uint64_t if the JIT compiler ever needs to determine the UTF-8 encoded length of an arbitrary String. The method is currently unused. Finally, this change removes the VM_getStringUTF8Length JITServer message type, which is never used. Signed-off-by: Henry Zongaro <[email protected]>

hzongaro · 2025-02-05T14:41:24Z

Thanks, @jdmpapin! I have squashed the commits.

jdmpapin · 2025-02-05T20:20:36Z

Jenkins test sanity.functional,sanity.openjdk all jdk8,jdk11,jdk17,jdk21 depends eclipse-omr/omr#7620

hzongaro · 2025-02-07T15:39:23Z

It looks like there is a known problem — issue #21066 — affecting JITServer tests. I will rerun PR testing once that problem has been resolved.

hzongaro added comp:jit depends:omr Pull request is dependent on a corresponding change in OMR labels Jan 24, 2025

hzongaro requested a review from jdmpapin January 24, 2025 14:58

hzongaro requested a review from dsouzai as a code owner January 24, 2025 14:58

hzongaro mentioned this pull request Jan 24, 2025

Change return type of TR_FrontEnd::getStringUTF8Length to int32_t eclipse-omr/omr#7620

Open

jdmpapin reviewed Jan 24, 2025

View reviewed changes

jdmpapin changed the title ~~Change return type of~~ Change return type of getStringUTF8Length Jan 24, 2025

jdmpapin approved these changes Feb 3, 2025

View reviewed changes

hzongaro added 2 commits February 5, 2025 06:33

hzongaro force-pushed the long-utf8-string-length branch from a4974d4 to fb3e0f2 Compare February 5, 2025 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change return type of getStringUTF8Length #21005

Change return type of getStringUTF8Length #21005

hzongaro commented Jan 24, 2025

hzongaro commented Jan 24, 2025

hzongaro commented Feb 3, 2025

jdmpapin left a comment

hzongaro commented Feb 5, 2025

jdmpapin commented Feb 5, 2025

hzongaro commented Feb 7, 2025

Change return type of getStringUTF8Length #21005

Are you sure you want to change the base?

Change return type of getStringUTF8Length #21005

Conversation

hzongaro commented Jan 24, 2025

hzongaro commented Jan 24, 2025

hzongaro commented Feb 3, 2025

jdmpapin left a comment

Choose a reason for hiding this comment

hzongaro commented Feb 5, 2025

jdmpapin commented Feb 5, 2025

hzongaro commented Feb 7, 2025