8352536: Add overloads to parse and build class files from/to MemorySegment #24139

dmlloyd · 2025-03-20T19:49:33Z

Provide method overloads to the ClassFile interface of the java.lang.classfile API which allow parsing of classes found in memory segments, as well as allowing built class files to be output to them.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue
Change requires CSR request JDK-8352562 to be approved

Issues

JDK-8352536: Add overloads to parse and build class files from/to MemorySegment (Enhancement - P4)
JDK-8352562: Add overloads to parse and build class files from/to MemorySegment (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24139/head:pull/24139
$ git checkout pull/24139

Update a local copy of the PR:
$ git checkout pull/24139
$ git pull https://git.openjdk.org/jdk.git pull/24139/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24139

View PR using the GUI difftool:
$ git pr show -t 24139

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24139.diff

Using Webrev

Link to Webrev Comment

…fer and MemorySegment

dmlloyd · 2025-03-20T19:49:44Z

/csr

bridgekeeper · 2025-03-20T19:49:58Z

👋 Welcome back dmlloyd! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-03-20T19:50:29Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-03-20T19:50:57Z

@dmlloyd an approved CSR request is already required for this pull request.

openjdk · 2025-03-20T19:51:41Z

@dmlloyd The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-03-20T19:54:47Z

Webrevs

src/java.base/share/classes/jdk/internal/classfile/impl/ClassFileImpl.java

mcimadamore · 2025-03-21T10:17:19Z

src/java.base/share/classes/jdk/internal/classfile/impl/ClassFileImpl.java

+    @Override
+    public ClassModel parse(final MemorySegment bytes) {
+        AbstractMemorySegmentImpl amsi = (AbstractMemorySegmentImpl) bytes;
+        if (amsi.unsafeGetBase() instanceof byte[] ba) {


there's also MemorySegment::heapObject if you don't want to drop down to unsafe. That method returns an empty optional if the segment is read-only -- but that might be good enough for a fast-path.

asotona · 2025-03-27T05:59:24Z

Class-File API passed through several significant weight reductions in its history (on the API surface as well as under the skin). Some of the steps were critical to avoid JDK bootstrap performance regression.

Before deciding about this non-trivial addition to the Class-File API I would like to see some measurable benefits and performance impact on the JDK bootstrap.

dmlloyd · 2025-03-27T14:15:12Z

A simple test to run a program with a single lambda in it did not cause the MemorySegment class to be loaded (as verified with -verbose:class), so there should be no regression in that regard.

Since the JDK code paths do not run through the new APIs, I would expect the only possible performance impact to come from the non-functional common code refactoring (extracting common methods) on the write side. So, I've run ClassfileBenchmark to ensure that no regression is happening there.

With patch:

Benchmark                                  Mode  Cnt      Score     Error  Units
ClassfileBenchmark.parse                  thrpt    5  52340.143 ± 492.543  ops/s
ClassfileBenchmark.transformWithAddedNOP  thrpt    5  11550.745 ± 146.057  ops/s
ClassfileBenchmark.transformWithNewCP     thrpt    5   7625.322 ±  50.366  ops/s
ClassfileBenchmark.transformWithSharedCP  thrpt    5  23755.083 ± 310.914  ops/s

Without patch:

Benchmark                                  Mode  Cnt      Score     Error  Units
ClassfileBenchmark.parse                  thrpt    5  52365.455 ± 649.145  ops/s
ClassfileBenchmark.transformWithAddedNOP  thrpt    5  11546.367 ± 105.941  ops/s
ClassfileBenchmark.transformWithNewCP     thrpt    5   7589.033 ±  66.221  ops/s
ClassfileBenchmark.transformWithSharedCP  thrpt    5  23729.577 ± 159.890  ops/s

It looks like they are all solidly within the margin of error. If there are any other tests you'd like me to run (or different parameters etc.) please let me know.

If this all looks ok, @asotona could you please review the CSR?

asotona · 2025-03-27T14:56:17Z

I'm sorry, but I'm still missing the benefits part of this change.

The costs are high. The API grows, implementation splits into multiple parallel branches in ClassFileImpl and DirectClassBuilder.
A new implementation class BuildAndParseBuffersAndSegments appears.
Parallel implementations add a new dimension to the test coverage matrix.
There is also still a lot of code related to ByteBuffer.

I'm not convinced the API should be extended with the MemorySegment overrides.

dmlloyd · 2025-03-27T15:30:39Z

I'm sorry, but I'm still missing the benefits part of this change.

The benefits are that the user can parse and generate class files directly from and to mapped files, and parse class files directly from buffers given to a class loader (for example). On the parsing side, there is an extra copy (this is not worse than today), but it is theoretically possible that we could optimize this someday. At absolute worst, it eliminates the need for the user to do this copy manually. On the generation side, there is no extra copy needed in comparison to the byte array implementation.

The costs are high. The API grows, implementation splits into multiple parallel branches in ClassFileImpl and DirectClassBuilder.

I'm not sure I follow you here. The implementation is only minimally changed. Half of the 419 lines added are javadocs. I'm not sure I understand where the new costs are coming from.

A new implementation class BuildAndParseBuffersAndSegments appears.

This is just a test class. It proves the functionality of the new API in terms of the existing API; this means that it is not necessary to retest all functionality for both input types.

Parallel implementations add a new dimension to the test coverage matrix.

It shouldn't; there is not a parallel implementation, and I think it's reasonable to at least make the argument that there should not be in the future either. Nothing here will force that issue one way or the other. As I said above, because there is still only one implementation, one test class is sufficient to show that parsing and generating to memory segments is working correctly.

There is also still a lot of code related to ByteBuffer.

All the ByteBuffer code is dead and was intended to be deleted; that was an error on my part. I've fixed the patch now if you would like to take another look.

I'm not convinced the API should be extended with the MemorySegment overrides.

I hope you reconsider in light of the above explanations. Thanks!

asotona · 2025-03-27T16:01:59Z

As the parsing benefits are none due to the memory copy.
May I see at least the benefits of direct writing to memory mapped files on some benchmark?

dmlloyd · 2025-03-27T16:58:29Z

As the parsing benefits are none due to the memory copy.

The main benefit on the parsing is twofold: reducing the user's boilerplate, and the future possibility of further optimizing this case. At present the performance is the same as making the user do the copy.

May I see at least the benefits of direct writing to memory mapped files on some benchmark?

Sure, I can put together a benchmark which compares the throughput with and without the copy and maybe also compares the allocation rate. I hope that would be sufficient to capture whatever there is to capture.

dmlloyd · 2025-03-27T17:39:59Z

Here's the raw benchmark results against AbstractMap and TreeMap:

Benchmark                                 Mode  Cnt       Score      Error  Units
MemorySegmentBenchmark.emitWithCopy0     thrpt    5  198061.082 ± 2300.146  ops/s
MemorySegmentBenchmark.emitWithCopy1     thrpt    5   35352.167 ±  320.823  ops/s
MemorySegmentBenchmark.emitWithoutCopy0  thrpt    5  265208.111 ± 1416.120  ops/s
MemorySegmentBenchmark.emitWithoutCopy1  thrpt    5   53215.327 ±  354.228  ops/s

0 is the smaller AbstractMap class bytes and 1 is the larger TreeMap class bytes. For case 0 we see an improvement of around 34% overall, and case 1 shows an improvement of closer to 50% (which is expected, since larger classes would mean copying more bytes as well as putting more pressure on the GC).

Here is the same benchmark with -prof gc enabled:

Benchmark                                                    Mode  Cnt       Score      Error   Units
MemorySegmentBenchmark.emitWithCopy0                        thrpt    5  197728.066 ± 3107.524   ops/s
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate          thrpt    5    3900.963 ±   61.292  MB/sec
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate.norm     thrpt    5   20688.004 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy0:gc.count               thrpt    5     680.000             counts
MemorySegmentBenchmark.emitWithCopy0:gc.time                thrpt    5     415.000                 ms
MemorySegmentBenchmark.emitWithCopy1                        thrpt    5   35504.531 ±  260.423   ops/s
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate          thrpt    5    3512.621 ±   25.778  MB/sec
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate.norm     thrpt    5  103744.020 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy1:gc.count               thrpt    5     673.000             counts
MemorySegmentBenchmark.emitWithCopy1:gc.time                thrpt    5     413.000                 ms
MemorySegmentBenchmark.emitWithoutCopy0                     thrpt    5  265533.600 ± 1707.914   ops/s
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate       thrpt    5    3547.167 ±   22.811  MB/sec
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate.norm  thrpt    5   14008.003 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy0:gc.count            thrpt    5     651.000             counts
MemorySegmentBenchmark.emitWithoutCopy0:gc.time             thrpt    5     392.000                 ms
MemorySegmentBenchmark.emitWithoutCopy1                     thrpt    5   52727.917 ±  624.059   ops/s
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate       thrpt    5    3531.104 ±   42.004  MB/sec
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate.norm  thrpt    5   70224.013 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy1:gc.count            thrpt    5     683.000             counts
MemorySegmentBenchmark.emitWithoutCopy1:gc.time             thrpt    5     412.000                 ms

You can see that in addition to the overhead of copying, we also put a bit more pressure on the GC despite having similar numbers of allocations by filling up our allocation regions more quickly with the extra large array per operation, which requires a little more time to be spent in GC on average. We are allocating roughly the same number of objects in either case.

asotona · 2025-03-27T22:21:00Z

I have two problems with the numbers you measured:

The benchmarked transformation runs on pre-parsed class and does nothing, so technically it measures mainly the memory copy process.
In the discussions there were mentioned memory mapped files as a use case for the off-heap targeting of the classes.

When I modify ClassfileBenchmark to replicate your scenario I get these numbers:

Benchmark                                             Mode  Cnt       Score     Error  Units
ClassfileBenchmark.transformWithAddedNOP             thrpt    5   31542.151 ?  63.838  ops/s
ClassfileBenchmark.transformWithAddedNOPWithoutCopy  thrpt    5   31738.045 ? 105.725  ops/s
ClassfileBenchmark.transformWithNewCP                thrpt    5   23514.061 ?  91.453  ops/s
ClassfileBenchmark.transformWithNewCPWithoutCopy     thrpt    5   23824.561 ? 532.565  ops/s
ClassfileBenchmark.transformWithSharedCP             thrpt    5   66083.564 ? 231.388  ops/s
ClassfileBenchmark.transformWithSharedCPWithoutCopy  thrpt    5   66780.329 ? 298.292  ops/s

So my measured performance benefit is around 1% and even that will vaporize when writing to physical files as inteded.

Unfortunately I could not recommend this PR.

dmlloyd · 2025-03-28T13:23:33Z

I have two problems with the numbers you measured:

The benchmarked transformation runs on pre-parsed class and does nothing, so technically it measures mainly the memory copy process.

In the discussions there were mentioned memory mapped files as a use case for the off-heap targeting of the classes.

When I modify ClassfileBenchmark to replicate your scenario I get these numbers:

Can you share your changes?

asotona · 2025-03-28T15:53:15Z

https://github.com/openjdk/jdk/pull/24297/files

liach · 2025-03-28T23:28:37Z

If we want to reduce allocation at writing time, we may well look at the new allocation facility from #24232; that is a general-utility tool to reduce allocation pressure for large byte arrays, especially if an actual byte[] view is not strictly required. BufWriter and the allocation-free copying of data to a memory segment may benefit.

8352536: Add overloads to parse and build class files from/to ByteBuf…

1e2356c

…fer and MemorySegment

openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels Mar 20, 2025

openjdk bot added the core-libs [email protected] label Mar 20, 2025

liach reviewed Mar 20, 2025

View reviewed changes

src/java.base/share/classes/jdk/internal/classfile/impl/ClassFileImpl.java Outdated Show resolved Hide resolved

mcimadamore reviewed Mar 21, 2025

View reviewed changes

Review feedback: drop ByteBuffer variants, and use SegmentAllocator

871dc55

dmlloyd changed the title ~~8352536: Add overloads to parse and build class files from/to ByteBuffer and MemorySegment~~ 8352536: Add overloads to parse and build class files from/to MemorySegment Mar 21, 2025

dmlloyd added 2 commits March 21, 2025 12:00

Minor doc adjustment

a57a0a0

Shorten method names

68ab1ac

Remove some missed dead code, update copyrights

8aa9803

Add a benchmark for class file emission

29e8872

openjdk bot mentioned this pull request Mar 28, 2025

modified ClassfileBenchmark #24297

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8352536: Add overloads to parse and build class files from/to MemorySegment #24139

8352536: Add overloads to parse and build class files from/to MemorySegment #24139

dmlloyd commented Mar 20, 2025 •

edited by openjdk bot

Loading

dmlloyd commented Mar 20, 2025

bridgekeeper bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

mlbridge bot commented Mar 20, 2025 •

edited

Loading

mcimadamore Mar 21, 2025

asotona commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025 •

edited

Loading

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025

dmlloyd commented Mar 28, 2025

asotona commented Mar 28, 2025

liach commented Mar 28, 2025

8352536: Add overloads to parse and build class files from/to MemorySegment #24139

Are you sure you want to change the base?

8352536: Add overloads to parse and build class files from/to MemorySegment #24139

Conversation

dmlloyd commented Mar 20, 2025 • edited by openjdk bot Loading

Progress

Issues

Reviewing

dmlloyd commented Mar 20, 2025

bridgekeeper bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

openjdk bot commented Mar 20, 2025

mlbridge bot commented Mar 20, 2025 • edited Loading

Webrevs

mcimadamore Mar 21, 2025

Choose a reason for hiding this comment

asotona commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025 • edited Loading

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

dmlloyd commented Mar 27, 2025

asotona commented Mar 27, 2025

dmlloyd commented Mar 28, 2025

asotona commented Mar 28, 2025

liach commented Mar 28, 2025

dmlloyd commented Mar 20, 2025 •

edited by openjdk bot

Loading

mlbridge bot commented Mar 20, 2025 •

edited

Loading

asotona commented Mar 27, 2025 •

edited

Loading