Skip to content

Refactor per-draw render data allocations with a binary opcode stream#21366

Open
ZehMatt wants to merge 13 commits into
AvaloniaUI:masterfrom
ZehMatt:refactor/render-data-binary-stream
Open

Refactor per-draw render data allocations with a binary opcode stream#21366
ZehMatt wants to merge 13 commits into
AvaloniaUI:masterfrom
ZehMatt:refactor/render-data-binary-stream

Conversation

@ZehMatt
Copy link
Copy Markdown
Contributor

@ZehMatt ZehMatt commented May 14, 2026

This work stems from the comment #20885 (comment) , this is not exactly how WPF is doing it, the unsafe keyword for example has been avoided, also I wasn't a huge fan of where #20885 was going.

What does the pull request do?

Replaces the render data representation. Until now every DrawingContext draw or push call recorded a heap-allocated node object (RenderDataLineNode, RenderDataRectangleNode, the push nodes, …) into a PooledInlineList<IRenderDataItem>.
This PR replaces those objects with a flat binary opcode stream plus a resource table. This is the approach WPF uses for its MILCMD render data.

It is an internal change: no public API, rendering output, or behaviour changes.

What is the current behavior?

Every recorded draw/push allocates a node object. For a visual whose content changes each frame (animations, custom-drawn controls) that is one GC-tracked allocation per draw call, every frame - gen0 churn proportional to the draw-call count.

What is the updated/expected behavior with this PR?

Render data is recorded as a byte[] opcode stream plus a resource table. Recording, server-side replay, hit-testing and bounds calculation all walk the byte stream directly, with no per-draw object allocation.

Rendering, hit-testing and visual bounds are unchanged, covered by the render-data contract tests merged in #21341 and the existing render tests, plus new unit tests added here, resource table, the three walkers, serialization round-trip, deep-nesting fallback.

The changes were measured locally with a stress test of ~9k draw calls per frame using mixed primitives.
master:

Draw calls/frame:     9,271
Measured window:      10.0s
Frames rendered:      404
Average FPS:          40.4
Total allocated:      498.74 MB
Allocated per frame:  1264.1 KB
Allocated per second: 49.85 MB/s
GC gen0 / gen1 / gen2: 83 / 83 / 0
Managed heap (end):   9.7 MB

PR:

Draw calls/frame:     9,271
Measured window:      10.0s
Frames rendered:      403
Average FPS:          40.3
Total allocated:      22.99 MB
Allocated per frame:  58.4 KB
Allocated per second: 2.30 MB/s
GC gen0 / gen1 / gen2: 4 / 0 / 0
Managed heap (end):   9.7 MB

The code for the test: https://gist.github.com/ZehMatt/6d033ecd016335b8b02f649a6666286c , its quite evident that this saves quite a bit of memory, in my testing there was no GC pressure anymore in the rendering path, now the major contributor to GC cycles is elsewhere.

How was the solution implemented (if it's not obvious)?

New types under Rendering/Composition/Drawing/:

  • RenderDataOpcode - one value per draw/push operation, plus Pop.
  • RenderDataWriter / RenderDataReader - the byte codec. Blittable payload structs (Point, Rect, RoundedRect, Matrix, BoxShadow, RenderOptions, …) are bulk-copied through a where T : unmanaged generic helper; the constraint is a compile-time guard against a payload type silently becoming non-blittable.
  • RenderDataResources - interns the non-blittable operands (brushes, pens, geometries, bitmaps, custom ops) to int handles referenced from the payloads.
  • RenderDataStream - owns the opcode stream and resource table, with the recording API and the replay / hit-test / bounds walkers. Push/Pop are inline opcodes; the walkers stackalloc their scope stack, sized from a max-push-depth tracked while
    recording.

The four consumers were switched over: RenderDataDrawingContext (recording), CompositionRenderData (client + hit-testing + serialization), ServerCompositionRenderData (server replay + bounds), ImmediateRenderDataSceneBrushContent and the old node classes and IRenderDataItem deleted.

The branch is structured as small standalone commits, so it should be easy to review the changes commit by commit.

Checklist

Fixed issues

Addresses some points of #19363 in regards to rendering.

Final Note

I'm glad that #20885 wasn't merged, this is definitely the cleaner solution to the problem and its backed up by the data.

@avaloniaui-bot
Copy link
Copy Markdown

You can test this PR using the following package version. 12.1.999-cibuild0065403-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@avaloniaui-bot
Copy link
Copy Markdown

You can test this PR using the following package version. 12.1.999-cibuild0065435-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants