Skip to content

Improve composition hit testing performance with per-visual AABBs#21310

Open
hez2010 wants to merge 13 commits into
AvaloniaUI:masterfrom
hez2010:dynamic-aabb-tree
Open

Improve composition hit testing performance with per-visual AABBs#21310
hez2010 wants to merge 13 commits into
AvaloniaUI:masterfrom
hez2010:dynamic-aabb-tree

Conversation

@hez2010
Copy link
Copy Markdown
Contributor

@hez2010 hez2010 commented May 4, 2026

What does the pull request do?

Adds a dynamic AABB tree for composition hit testing so containers with many composition children can avoid linearly scanning every child for each hit test.

The hit-test index is grouped by child z-order buckets. Each bucket owns its own AABB tree and is queried from top-most bucket to bottom-most bucket, so HitTestFirst can stop as soon as a real hit is found while preserving visual order.

It also adds a Hit Testing page in RenderDemo for manually stressing composition hit testing with many static and animated visuals.

What is the current behavior?

Composition hit testing scans child visuals linearly. For containers with many children, HitTestFirst can scale with the number of siblings even when only a small subset can contain the query point.

What is the updated/expected behavior with this PR?

Composition containers switch to a bucketed dynamic AABB tree once the child count is large enough. Hit testing first searches likely candidates by bounds, then preserves existing z-order semantics by processing buckets from top-most to bottom-most and sorting candidates only within each bucket.

The threshold is based on benchmark tradeoffs. Below this point the linear path avoids AABB-tree overhead; at and above this point the indexed path can significantly improve large static or sparse-hit sibling scenarios while keeping animated-update overhead bounded.

Method Job VisualCount Mean Error StdDev Gen0 Allocated
HitTestFirst AabbTree 1 217.9 ns 1.75 ns 1.64 ns 0.0014 24 B
HitTestFirst Linear 1 168.0 ns 0.71 ns 0.66 ns 0.0014 24 B
HitTestFirst AabbTree 2 216.6 ns 1.29 ns 1.21 ns 0.0014 24 B
HitTestFirst Linear 2 183.1 ns 0.63 ns 0.55 ns 0.0014 24 B
HitTestFirst AabbTree 4 216.6 ns 0.22 ns 0.20 ns 0.0014 24 B
HitTestFirst Linear 4 204.3 ns 0.21 ns 0.19 ns 0.0014 24 B
HitTestFirst AabbTree 8 238.5 ns 0.79 ns 0.66 ns 0.0014 24 B
HitTestFirst Linear 8 262.7 ns 1.03 ns 0.92 ns 0.0014 24 B
HitTestFirst AabbTree 16 223.6 ns 0.71 ns 0.66 ns 0.0014 24 B
HitTestFirst Linear 16 424.3 ns 2.80 ns 2.48 ns 0.0014 24 B
HitTestFirst AabbTree 32 227.3 ns 0.22 ns 0.20 ns 0.0014 24 B
HitTestFirst Linear 32 623.3 ns 0.78 ns 0.73 ns 0.0010 24 B
HitTestFirst AabbTree 64 233.4 ns 2.09 ns 1.75 ns 0.0014 24 B
HitTestFirst Linear 64 1,116.4 ns 11.13 ns 10.41 ns - 24 B
HitTestFirst AabbTree 1024 260.7 ns 1.32 ns 1.23 ns 0.0014 24 B
HitTestFirst Linear 1024 14,333.6 ns 133.63 ns 124.99 ns - 24 B
HitTestFirst AabbTree 4096 273.8 ns 1.02 ns 0.96 ns 0.0014 24 B
HitTestFirst Linear 4096 74,153.3 ns 1,159.34 ns 1,138.63 ns - 24 B
HitTestFirst AabbTree 16384 299.0 ns 2.26 ns 2.11 ns 0.0014 24 B
HitTestFirst Linear 16384 359,784.4 ns 2,904.20 ns 2,716.59 ns - 24 B
Method Job VisualCount Mean Error StdDev Gen0 Allocated
HitTestAnimatedChild AabbTree 1 1.781 us 0.0129 us 0.0107 us 0.0229 384 B
HitTestAnimatedChild Linear 1 1.380 us 0.0065 us 0.0054 us 0.0210 344 B
HitTestAnimatedChild AabbTree 2 1.926 us 0.0041 us 0.0034 us 0.0229 384 B
HitTestAnimatedChild Linear 2 1.409 us 0.0035 us 0.0029 us 0.0210 344 B
HitTestAnimatedChild AabbTree 4 1.891 us 0.0128 us 0.0107 us 0.0229 384 B
HitTestAnimatedChild Linear 4 1.545 us 0.0057 us 0.0048 us 0.0210 344 B
HitTestAnimatedChild AabbTree 8 2.107 us 0.0285 us 0.0253 us 0.0267 432 B
HitTestAnimatedChild Linear 8 1.675 us 0.0054 us 0.0048 us 0.0210 344 B
HitTestAnimatedChild AabbTree 16 2.491 us 0.0121 us 0.0101 us 0.0267 425 B
HitTestAnimatedChild Linear 16 2.052 us 0.0138 us 0.0115 us 0.0191 344 B
HitTestAnimatedChild AabbTree 32 3.145 us 0.0181 us 0.0160 us 0.0267 435 B
HitTestAnimatedChild Linear 32 2.805 us 0.0233 us 0.0194 us 0.0191 344 B
HitTestAnimatedChild AabbTree 64 4.635 us 0.0419 us 0.0392 us 0.0229 439 B
HitTestAnimatedChild Linear 64 3.990 us 0.0263 us 0.0220 us 0.0153 344 B
HitTestAnimatedChild AabbTree 1024 7.360 us 0.0441 us 0.0412 us 0.0229 437 B
HitTestAnimatedChild Linear 1024 6.957 us 0.0613 us 0.0544 us 0.0153 344 B
HitTestAnimatedChild AabbTree 4096 25.035 us 0.2684 us 0.2511 us - 433 B
HitTestAnimatedChild Linear 4096 22.758 us 0.0693 us 0.0614 us - 344 B
HitTestAnimatedChild AabbTree 16384 83.313 us 0.5890 us 0.5509 us - 441 B
HitTestAnimatedChild Linear 16384 80.693 us 0.3783 us 0.3539 us - 344 B

When there're 16384 static visuals on 4 layers of subtrees, it can speed up the hit testing by ~1200x.

Source Code

Static benchmark

public class CompositionHitTesting
{
    private const int CellSize = 8;
    private const int CellStride = 12;
    private const int TreeDepth = 4;

    private CompositorTestServices? _services;
    private Point _hitPoint;
    private Border? _expectedHit;

    [Params(1, 2, 4, 8, 16, 32, 64, 1024, 4096, 16384)]
    public int VisualCount { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        var canvas = BuildGrid(VisualCount, out var size, out _expectedHit);

        _services = new CompositorTestServices(size);
        _services.TopLevel.Content = canvas;
        _services.RunJobs();

        _hitPoint = new Point(CellSize / 2d, CellSize / 2d);

        if (!ReferenceEquals(HitTestFirst(), _expectedHit))
            throw new InvalidOperationException("Hit test returned an unexpected visual.");
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        _services?.Dispose();
        _services = null;
        _expectedHit = null;
    }

    [Benchmark]
    public Visual? HitTestFirst()
    {
        return _services!.Renderer.HitTestFirst(_hitPoint, _services.TopLevel, null);
    }

    internal static Canvas BuildGrid(int visualCount, out Size size, out Border? firstChild)
    {
        var columns = (int)Math.Ceiling(Math.Sqrt(visualCount));
        var rows = (visualCount + columns - 1) / columns;
        size = new Size(columns * CellStride, rows * CellStride);
        firstChild = null;

        var root = new Canvas
        {
            Width = size.Width,
            Height = size.Height
        };

        var leafHost = root;
        for (var depth = 0; depth < TreeDepth; depth++)
        {
            var nested = new Canvas
            {
                Width = size.Width,
                Height = size.Height
            };
            leafHost.Children.Add(nested);
            leafHost = nested;
        }

        for (var i = 0; i < visualCount; i++)
        {
            var child = new Border
            {
                Width = CellSize,
                Height = CellSize,
                Background = Brushes.Red
            };

            Canvas.SetLeft(child, i % columns * CellStride);
            Canvas.SetTop(child, i / columns * CellStride);
            leafHost.Children.Add(child);

            if (i == 0)
                firstChild = child;
        }

        return root;
    }
}

Animated benchmark

public class CompositionHitTestingAnimated
{
    private const int CellSize = 8;
    private const int CellStride = 12;
    private const int TreeDepth = 4;

    private CompositorTestServices? _services;
    private CompositionVisual? _animatedVisual;
    private Border? _expectedHit;
    private Point _hitPoint;

    [Params(1, 2, 4, 8, 16, 32, 64, 1024, 4096, 16384)]
    public int VisualCount { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        var canvas = BuildDeepAnimatedGrid(VisualCount, out var size, out _expectedHit);

        _services = new CompositorTestServices(size);
        _services.TopLevel.Content = canvas;
        _services.RunJobs();

        _animatedVisual = _expectedHit!.CompositionVisual;
        StartOffsetAnimation();
        _services.RunJobs();
        UpdateHitPoint();

        if (!ReferenceEquals(HitTestAnimatedChild(), _expectedHit))
            throw new InvalidOperationException("Hit test returned an unexpected visual.");
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        _services?.Dispose();
        _services = null;
        _animatedVisual = null;
        _expectedHit = null;
    }

    [Benchmark]
    public Visual? HitTestAnimatedChild()
    {
        _services!.RunJobs();
        UpdateHitPoint();
        return _services.Renderer.HitTestFirst(_hitPoint, _services.TopLevel, null);
    }

    private void StartOffsetAnimation()
    {
        var animation = _animatedVisual!.Compositor.CreateVector3KeyFrameAnimation();
        animation.Target = "Offset";
        animation.InsertKeyFrame(0f, new Vector3(CellStride, CellStride, 0), new LinearEasing());
        animation.InsertKeyFrame(1f, new Vector3(CellStride * 3, CellStride * 3, 0), new LinearEasing());
        animation.Duration = TimeSpan.FromSeconds(1);
        animation.Direction = PlaybackDirection.Alternate;
        animation.IterationBehavior = AnimationIterationBehavior.Forever;
        _animatedVisual.StartAnimation("Offset", animation);
    }

    private void UpdateHitPoint()
    {
        var server = _animatedVisual!.Server;
        var bounds = server.GetReadback(server.Compositor.Readback.LastCompletedWrite)!.TransformedSubtreeBounds!.Value;
        _hitPoint = new Point((bounds.Left + bounds.Right) / 2, (bounds.Top + bounds.Bottom) / 2);
    }

    private static Canvas BuildDeepAnimatedGrid(int visualCount, out Size size, out Border target)
    {
        var branchCount = Math.Min(8, Math.Max(1, visualCount / 64));
        var leavesPerBranch = (visualCount + branchCount - 1) / branchCount;
        var columns = (int)Math.Ceiling(Math.Sqrt(leavesPerBranch + 1));
        var rows = (leavesPerBranch + columns - 1) / columns;
        var branchSize = new Size(columns * CellStride + CellStride * 4, rows * CellStride + CellStride * 4);

        size = new Size(branchSize.Width * branchCount, branchSize.Height);
        target = null!;

        var root = new Canvas
        {
            Width = size.Width,
            Height = size.Height
        };

        var remaining = visualCount;
        for (var branch = 0; branch < branchCount; branch++)
        {
            var branchRoot = new Canvas
            {
                Width = branchSize.Width,
                Height = branchSize.Height
            };
            Canvas.SetLeft(branchRoot, branch * branchSize.Width);
            root.Children.Add(branchRoot);

            var leafHost = branchRoot;
            for (var depth = 0; depth < TreeDepth; depth++)
            {
                var nested = new Canvas
                {
                    Width = branchSize.Width,
                    Height = branchSize.Height
                };
                leafHost.Children.Add(nested);
                leafHost = nested;
            }

            var count = Math.Min(leavesPerBranch, remaining);
            remaining -= count;

            if (branch == 0)
                count--;

            for (var i = 0; i < count; i++)
            {
                var child = new Border
                {
                    Width = CellSize,
                    Height = CellSize,
                    Background = Brushes.Red
                };

                Canvas.SetLeft(child, (i % columns) * CellStride);
                Canvas.SetTop(child, (i / columns) * CellStride);
                leafHost.Children.Add(child);
            }

            if (branch == 0)
            {
                target = new Border
                {
                    Width = CellSize,
                    Height = CellSize,
                    Background = Brushes.Blue
                };

                Canvas.SetLeft(target, CellStride);
                Canvas.SetTop(target, CellStride);
                leafHost.Children.Add(target);
            }
        }

        return root;
    }
}

The comparison of the number of updates per second when performing 100,000 times (a batch) of hit testing per composition update shows a ~12x speed up in actual app:

Linear Hit Testing (existing)

linear

Rate: 1.5 batches/s

AABB Hit Testing (new)

aabb

Rate: 17.9 batches/s

How was the solution implemented (if it's not obvious)?

The hit-test index is maintained from composition child add/remove/order changes. Bounds are refreshed lazily against the current composition readback revision during hit testing.

Bounded visuals are stored in fixed-size child-order buckets. Each bucket has its own dynamic AABB tree, and queries walk buckets from highest child order to lowest child order. Before a bucket is queried, it refreshes only the child entries in that bucket whose readback revision changed.

Candidates are sorted only inside the current bucket, which keeps hit-test order correct without requiring a global candidate sort.

Visuals that cannot safely use subtree-bounds optimization are kept as unbounded entries in their z-order bucket, so custom hit-test visuals and similar cases preserve existing hit-test semantics.

Checklist

Breaking changes

None.

Obsoletions / Deprecations

None.

@hez2010 hez2010 changed the title Improve hit testing performance with per-visual AABB Improve composition hit testing performance with per-visual AABBs May 4, 2026
@avaloniaui-bot
Copy link
Copy Markdown

You can test this PR using the following package version. 12.1.999-cibuild0065236-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@avaloniaui-bot
Copy link
Copy Markdown

You can test this PR using the following package version. 12.1.999-cibuild0065238-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@kekekeks
Copy link
Copy Markdown
Member

Why do we need to populate a separate update list? You should be able detect stale subtree data by comparing readback revision with previously captured one.

@kekekeks
Copy link
Copy Markdown
Member

i. e. _needsBoundlingBoxUpdate is propagated to parent

var setIsDirtyForRenderInSubgraph = additionalDirtyRegion || dirtyForRender;
while (parent != null &&
((needsBoundingBoxUpdate && !parent._needsBoundingBoxUpdate) ||
(setIsDirtyForRenderInSubgraph && !parent._isDirtyForRenderInSubgraph)))
{
parent._needsBoundingBoxUpdate |= needsBoundingBoxUpdate;
parent._isDirtyForRenderInSubgraph |= setIsDirtyForRenderInSubgraph;
parent = parent.Parent;
}
_needsBoundingBoxUpdate |= needsBoundingBoxUpdate;
_isDirtyForRender |= dirtyForRender;

which triggers
if (node._needsBoundingBoxUpdate)
{
//
// If pNode's bbox got recomputed it is at this point still in inner
// space. We need to apply the clip and transform.
//
FinalizeSubtreeBounds(node);
}
which triggers FinalizeSubTreeBounds which forces readback update -

So if any descendant has changed it's hit-test data you'll get a bumped readback revision.

@hez2010
Copy link
Copy Markdown
Contributor Author

hez2010 commented May 15, 2026

You should be able detect stale subtree data by comparing readback revision with previously captured one.

Oops didn't realize the readback revision could be used for this. Resolved in 252d107.

PTAL.

@avaloniaui-bot
Copy link
Copy Markdown

You can test this PR using the following package version. 12.1.999-cibuild0065439-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@hez2010
Copy link
Copy Markdown
Contributor Author

hez2010 commented May 15, 2026

Latest benchmark result:

Static visuals (linear when < 32, AABB when >= 32):

Method VisualCount Mean Error StdDev Gen0 Allocated
HitTestStaticChild 1 163.5 ns 1.24 ns 1.16 ns 0.0014 24 B
HitTestStaticChild 2 176.1 ns 0.46 ns 0.43 ns 0.0014 24 B
HitTestStaticChild 4 200.7 ns 1.76 ns 1.65 ns 0.0014 24 B
HitTestStaticChild 8 252.1 ns 0.56 ns 0.50 ns 0.0014 24 B
HitTestStaticChild 16 413.9 ns 1.64 ns 1.54 ns 0.0014 24 B
HitTestStaticChild 32 200.0 ns 0.53 ns 0.47 ns 0.0014 24 B
HitTestStaticChild 64 199.8 ns 0.91 ns 0.85 ns 0.0014 24 B
HitTestStaticChild 1024 413.8 ns 2.58 ns 2.41 ns 0.0014 24 B
HitTestStaticChild 4096 701.8 ns 3.20 ns 2.99 ns 0.0010 24 B
HitTestStaticChild 16384 2,489.0 ns 7.40 ns 6.92 ns - 24 B

Animated visuals (linear when < 32, AABB when >= 32):

Method VisualCount Mean Error StdDev Gen0 Allocated
HitTestAnimatedChild 1 1.397 us 0.0199 us 0.0156 us 0.0210 344 B
HitTestAnimatedChild 2 1.413 us 0.0067 us 0.0056 us 0.0210 344 B
HitTestAnimatedChild 4 1.503 us 0.0034 us 0.0029 us 0.0210 344 B
HitTestAnimatedChild 8 1.692 us 0.0166 us 0.0138 us 0.0210 344 B
HitTestAnimatedChild 16 2.071 us 0.0164 us 0.0137 us 0.0191 344 B
HitTestAnimatedChild 32 3.346 us 0.0173 us 0.0145 us 0.0229 344 B
HitTestAnimatedChild 64 4.486 us 0.0222 us 0.0186 us 0.0153 344 B
HitTestAnimatedChild 1024 7.451 us 0.0314 us 0.0262 us 0.0153 344 B
HitTestAnimatedChild 4096 22.088 us 0.0963 us 0.0853 us - 344 B
HitTestAnimatedChild 16384 86.832 us 0.8229 us 0.7295 us - 344 B

The threshold of switching to use AABB is 32, that's why there's a drop at 32 in the static visual tests.

For reference, the existing linear hit testing:

Method VisualCount Mean Error StdDev Gen0 Allocated
HitTestStaticChild 1 171.6 ns 0.52 ns 0.49 ns 0.0014 24 B
HitTestStaticChild 2 174.1 ns 0.74 ns 0.69 ns 0.0014 24 B
HitTestStaticChild 4 198.6 ns 0.33 ns 0.31 ns 0.0014 24 B
HitTestStaticChild 8 250.2 ns 0.31 ns 0.29 ns 0.0014 24 B
HitTestStaticChild 16 415.0 ns 2.44 ns 2.28 ns 0.0014 24 B
HitTestStaticChild 32 620.1 ns 4.93 ns 4.61 ns 0.0010 24 B
HitTestStaticChild 64 1,070.8 ns 5.17 ns 4.84 ns - 24 B
HitTestStaticChild 1024 13,863.1 ns 147.59 ns 138.05 ns - 24 B
HitTestStaticChild 4096 66,152.2 ns 1,092.66 ns 1,022.07 ns - 24 B
HitTestStaticChild 16384 306,783.9 ns 1,771.81 ns 1,479.54 ns - 24 B
Method VisualCount Mean Error StdDev Gen0 Allocated
HitTestAnimatedChild 1 1.381 us 0.0086 us 0.0072 us 0.0210 344 B
HitTestAnimatedChild 2 1.448 us 0.0220 us 0.0172 us 0.0210 344 B
HitTestAnimatedChild 4 1.568 us 0.0251 us 0.0223 us 0.0210 344 B
HitTestAnimatedChild 8 1.679 us 0.0068 us 0.0057 us 0.0210 344 B
HitTestAnimatedChild 16 2.067 us 0.0111 us 0.0092 us 0.0191 344 B
HitTestAnimatedChild 32 2.685 us 0.0115 us 0.0096 us 0.0191 344 B
HitTestAnimatedChild 64 4.052 us 0.0122 us 0.0102 us 0.0153 344 B
HitTestAnimatedChild 1024 7.227 us 0.0437 us 0.0387 us 0.0153 344 B
HitTestAnimatedChild 4096 22.844 us 0.1461 us 0.1295 us - 344 B
HitTestAnimatedChild 16384 80.186 us 0.4557 us 0.4040 us - 344 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants