Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8352536: Add overloads to parse and build class files from/to MemorySegment #24139

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dmlloyd
Copy link
Contributor

@dmlloyd dmlloyd commented Mar 20, 2025

Provide method overloads to the ClassFile interface of the java.lang.classfile API which allow parsing of classes found in memory segments, as well as allowing built class files to be output to them.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8352562 to be approved

Issues

  • JDK-8352536: Add overloads to parse and build class files from/to MemorySegment (Enhancement - P4)
  • JDK-8352562: Add overloads to parse and build class files from/to MemorySegment (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24139/head:pull/24139
$ git checkout pull/24139

Update a local copy of the PR:
$ git checkout pull/24139
$ git pull https://git.openjdk.org/jdk.git pull/24139/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24139

View PR using the GUI difftool:
$ git pr show -t 24139

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24139.diff

Using Webrev

Link to Webrev Comment

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 20, 2025

/csr

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 20, 2025

👋 Welcome back dmlloyd! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 20, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels Mar 20, 2025
@openjdk
Copy link

openjdk bot commented Mar 20, 2025

@dmlloyd an approved CSR request is already required for this pull request.

@openjdk
Copy link

openjdk bot commented Mar 20, 2025

@dmlloyd The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Mar 20, 2025

Webrevs

@Override
public ClassModel parse(final MemorySegment bytes) {
AbstractMemorySegmentImpl amsi = (AbstractMemorySegmentImpl) bytes;
if (amsi.unsafeGetBase() instanceof byte[] ba) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also MemorySegment::heapObject if you don't want to drop down to unsafe. That method returns an empty optional if the segment is read-only -- but that might be good enough for a fast-path.

@dmlloyd dmlloyd changed the title 8352536: Add overloads to parse and build class files from/to ByteBuffer and MemorySegment 8352536: Add overloads to parse and build class files from/to MemorySegment Mar 21, 2025
@asotona
Copy link
Member

asotona commented Mar 27, 2025

Class-File API passed through several significant weight reductions in its history (on the API surface as well as under the skin). Some of the steps were critical to avoid JDK bootstrap performance regression.

Before deciding about this non-trivial addition to the Class-File API I would like to see some measurable benefits and performance impact on the JDK bootstrap.

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 27, 2025

A simple test to run a program with a single lambda in it did not cause the MemorySegment class to be loaded (as verified with -verbose:class), so there should be no regression in that regard.

Since the JDK code paths do not run through the new APIs, I would expect the only possible performance impact to come from the non-functional common code refactoring (extracting common methods) on the write side. So, I've run ClassfileBenchmark to ensure that no regression is happening there.

With patch:

Benchmark                                  Mode  Cnt      Score     Error  Units
ClassfileBenchmark.parse                  thrpt    5  52340.143 ± 492.543  ops/s
ClassfileBenchmark.transformWithAddedNOP  thrpt    5  11550.745 ± 146.057  ops/s
ClassfileBenchmark.transformWithNewCP     thrpt    5   7625.322 ±  50.366  ops/s
ClassfileBenchmark.transformWithSharedCP  thrpt    5  23755.083 ± 310.914  ops/s

Without patch:

Benchmark                                  Mode  Cnt      Score     Error  Units
ClassfileBenchmark.parse                  thrpt    5  52365.455 ± 649.145  ops/s
ClassfileBenchmark.transformWithAddedNOP  thrpt    5  11546.367 ± 105.941  ops/s
ClassfileBenchmark.transformWithNewCP     thrpt    5   7589.033 ±  66.221  ops/s
ClassfileBenchmark.transformWithSharedCP  thrpt    5  23729.577 ± 159.890  ops/s

It looks like they are all solidly within the margin of error. If there are any other tests you'd like me to run (or different parameters etc.) please let me know.

If this all looks ok, @asotona could you please review the CSR?

@asotona
Copy link
Member

asotona commented Mar 27, 2025

I'm sorry, but I'm still missing the benefits part of this change.

The costs are high. The API grows, implementation splits into multiple parallel branches in ClassFileImpl and DirectClassBuilder.
A new implementation class BuildAndParseBuffersAndSegments appears.
Parallel implementations add a new dimension to the test coverage matrix.
There is also still a lot of code related to ByteBuffer.

I'm not convinced the API should be extended with the MemorySegment overrides.

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 27, 2025

I'm sorry, but I'm still missing the benefits part of this change.

The benefits are that the user can parse and generate class files directly from and to mapped files, and parse class files directly from buffers given to a class loader (for example). On the parsing side, there is an extra copy (this is not worse than today), but it is theoretically possible that we could optimize this someday. At absolute worst, it eliminates the need for the user to do this copy manually. On the generation side, there is no extra copy needed in comparison to the byte array implementation.

The costs are high. The API grows, implementation splits into multiple parallel branches in ClassFileImpl and DirectClassBuilder.

I'm not sure I follow you here. The implementation is only minimally changed. Half of the 419 lines added are javadocs. I'm not sure I understand where the new costs are coming from.

A new implementation class BuildAndParseBuffersAndSegments appears.

This is just a test class. It proves the functionality of the new API in terms of the existing API; this means that it is not necessary to retest all functionality for both input types.

Parallel implementations add a new dimension to the test coverage matrix.

It shouldn't; there is not a parallel implementation, and I think it's reasonable to at least make the argument that there should not be in the future either. Nothing here will force that issue one way or the other. As I said above, because there is still only one implementation, one test class is sufficient to show that parsing and generating to memory segments is working correctly.

There is also still a lot of code related to ByteBuffer.

All the ByteBuffer code is dead and was intended to be deleted; that was an error on my part. I've fixed the patch now if you would like to take another look.

I'm not convinced the API should be extended with the MemorySegment overrides.

I hope you reconsider in light of the above explanations. Thanks!

@asotona
Copy link
Member

asotona commented Mar 27, 2025

As the parsing benefits are none due to the memory copy.
May I see at least the benefits of direct writing to memory mapped files on some benchmark?

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 27, 2025

As the parsing benefits are none due to the memory copy.

The main benefit on the parsing is twofold: reducing the user's boilerplate, and the future possibility of further optimizing this case. At present the performance is the same as making the user do the copy.

May I see at least the benefits of direct writing to memory mapped files on some benchmark?

Sure, I can put together a benchmark which compares the throughput with and without the copy and maybe also compares the allocation rate. I hope that would be sufficient to capture whatever there is to capture.

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 27, 2025

Here's the raw benchmark results against AbstractMap and TreeMap:

Benchmark                                 Mode  Cnt       Score      Error  Units
MemorySegmentBenchmark.emitWithCopy0     thrpt    5  198061.082 ± 2300.146  ops/s
MemorySegmentBenchmark.emitWithCopy1     thrpt    5   35352.167 ±  320.823  ops/s
MemorySegmentBenchmark.emitWithoutCopy0  thrpt    5  265208.111 ± 1416.120  ops/s
MemorySegmentBenchmark.emitWithoutCopy1  thrpt    5   53215.327 ±  354.228  ops/s

0 is the smaller AbstractMap class bytes and 1 is the larger TreeMap class bytes. For case 0 we see an improvement of around 34% overall, and case 1 shows an improvement of closer to 50% (which is expected, since larger classes would mean copying more bytes as well as putting more pressure on the GC).

Here is the same benchmark with -prof gc enabled:

Benchmark                                                    Mode  Cnt       Score      Error   Units
MemorySegmentBenchmark.emitWithCopy0                        thrpt    5  197728.066 ± 3107.524   ops/s
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate          thrpt    5    3900.963 ±   61.292  MB/sec
MemorySegmentBenchmark.emitWithCopy0:gc.alloc.rate.norm     thrpt    5   20688.004 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy0:gc.count               thrpt    5     680.000             counts
MemorySegmentBenchmark.emitWithCopy0:gc.time                thrpt    5     415.000                 ms
MemorySegmentBenchmark.emitWithCopy1                        thrpt    5   35504.531 ±  260.423   ops/s
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate          thrpt    5    3512.621 ±   25.778  MB/sec
MemorySegmentBenchmark.emitWithCopy1:gc.alloc.rate.norm     thrpt    5  103744.020 ±    0.001    B/op
MemorySegmentBenchmark.emitWithCopy1:gc.count               thrpt    5     673.000             counts
MemorySegmentBenchmark.emitWithCopy1:gc.time                thrpt    5     413.000                 ms
MemorySegmentBenchmark.emitWithoutCopy0                     thrpt    5  265533.600 ± 1707.914   ops/s
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate       thrpt    5    3547.167 ±   22.811  MB/sec
MemorySegmentBenchmark.emitWithoutCopy0:gc.alloc.rate.norm  thrpt    5   14008.003 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy0:gc.count            thrpt    5     651.000             counts
MemorySegmentBenchmark.emitWithoutCopy0:gc.time             thrpt    5     392.000                 ms
MemorySegmentBenchmark.emitWithoutCopy1                     thrpt    5   52727.917 ±  624.059   ops/s
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate       thrpt    5    3531.104 ±   42.004  MB/sec
MemorySegmentBenchmark.emitWithoutCopy1:gc.alloc.rate.norm  thrpt    5   70224.013 ±    0.001    B/op
MemorySegmentBenchmark.emitWithoutCopy1:gc.count            thrpt    5     683.000             counts
MemorySegmentBenchmark.emitWithoutCopy1:gc.time             thrpt    5     412.000                 ms

You can see that in addition to the overhead of copying, we also put a bit more pressure on the GC despite having similar numbers of allocations by filling up our allocation regions more quickly with the extra large array per operation, which requires a little more time to be spent in GC on average. We are allocating roughly the same number of objects in either case.

@asotona
Copy link
Member

asotona commented Mar 27, 2025

I have two problems with the numbers you measured:

  1. The benchmarked transformation runs on pre-parsed class and does nothing, so technically it measures mainly the memory copy process.
  2. In the discussions there were mentioned memory mapped files as a use case for the off-heap targeting of the classes.

When I modify ClassfileBenchmark to replicate your scenario I get these numbers:

Benchmark                                             Mode  Cnt       Score     Error  Units
ClassfileBenchmark.transformWithAddedNOP             thrpt    5   31542.151 ?  63.838  ops/s
ClassfileBenchmark.transformWithAddedNOPWithoutCopy  thrpt    5   31738.045 ? 105.725  ops/s
ClassfileBenchmark.transformWithNewCP                thrpt    5   23514.061 ?  91.453  ops/s
ClassfileBenchmark.transformWithNewCPWithoutCopy     thrpt    5   23824.561 ? 532.565  ops/s
ClassfileBenchmark.transformWithSharedCP             thrpt    5   66083.564 ? 231.388  ops/s
ClassfileBenchmark.transformWithSharedCPWithoutCopy  thrpt    5   66780.329 ? 298.292  ops/s

So my measured performance benefit is around 1% and even that will vaporize when writing to physical files as inteded.

Unfortunately I could not recommend this PR.

@dmlloyd
Copy link
Contributor Author

dmlloyd commented Mar 28, 2025

I have two problems with the numbers you measured:

  1. The benchmarked transformation runs on pre-parsed class and does nothing, so technically it measures mainly the memory copy process.
  2. In the discussions there were mentioned memory mapped files as a use case for the off-heap targeting of the classes.

When I modify ClassfileBenchmark to replicate your scenario I get these numbers:

Can you share your changes?

@asotona
Copy link
Member

asotona commented Mar 28, 2025

@openjdk openjdk bot mentioned this pull request Mar 28, 2025
3 tasks
@liach
Copy link
Member

liach commented Mar 28, 2025

If we want to reduce allocation at writing time, we may well look at the new allocation facility from #24232; that is a general-utility tool to reduce allocation pressure for large byte arrays, especially if an actual byte[] view is not strictly required. BufWriter and the allocation-free copying of data to a memory segment may benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs [email protected] csr Pull request needs approved CSR before integration rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants