Skip to content

Comments

Implement SelectMergeTree via the model.Tree#4790

Draft
simonswine wants to merge 10 commits intomainfrom
20260122_generic-tree
Draft

Implement SelectMergeTree via the model.Tree#4790
simonswine wants to merge 10 commits intomainfrom
20260122_generic-tree

Conversation

@simonswine
Copy link
Contributor

@simonswine simonswine commented Jan 26, 2026

We currently use a completely separate query path for SelectMergeStacktraces vs. SelectMergePprof calls. This PR is a first step in unifying the two paths, by implementing a model.Tree based merge.

This refactors model.Tree to be generic over its node name type, supporting both string-based function names (FuntionName) and integer location references (LocationRefName). This enables the query-backend path to build flamegraph trees using location refs directly, deferring string resolution until the final merge step.

Most complexity is in the SymbolMerger in symdb that deduplicates and merges symbols (strings, mappings, functions, locations) across profile blocks using content hashing (xxhash), with linear probing to gracefully handle hash collisions.

After merging this, the old path is still active by default, there is a per-tenant override, gating the tree-based SelectMergeProfile code path. Also adds test coverage for the query backend, frontend query path, and the symbol merger.

Note: The integration tests after this PR will only target the tree based variant.

@simonswine simonswine changed the title 20260122 generic tree Implement SelectMergeTree via the query Jan 29, 2026
@simonswine simonswine changed the title Implement SelectMergeTree via the query Implement SelectMergeTree via the model.Tree Jan 29, 2026
@simonswine simonswine force-pushed the 20260122_generic-tree branch 2 times, most recently from 232757f to 6b0b68b Compare February 11, 2026 14:57
@simonswine simonswine force-pushed the 20260122_generic-tree branch 3 times, most recently from 0aa934b to 7f557ec Compare February 13, 2026 11:06
@@ -0,0 +1,614 @@
package symdb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most complex part of this PR, worth starting to look at it @aleks-p

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonswine looks good to me overall, I think we can move it forward once we address the few TODOs

h.sl = slices.Grow(h.sl, size)
h.hashes = slices.Grow(h.hashes, size)
// grow map if still empty
if len(h.m) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this is perhaps never hit, we always call add when creating new mergers

type equalityFn[A any] func(A, A) bool

type hashedSlice[A any] struct {
merger *SymbolMerger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, it doesn't seem like we use this

@@ -0,0 +1,614 @@
package symdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonswine looks good to me overall, I think we can move it forward once we address the few TODOs

- [x] TODO fix "other"
- [ ] Make sure truncation works on max symbols rather than max nodes
- [x] Consistently number IDs starting from 1
- [ ] Share logic of getting pprof from tree for profilecli
hashedSlice.add() previously panicked on 64-bit xxhash collisions.
Replace the panic with linear probing on the map key so collisions
are handled gracefully: the probe hash is incremented until an empty
slot is found, while the original content hash is always stored in
hashes[] for use by parent-level hash computation.

Add TestSymbolMerger_HashCollision to verify that two distinct values
sharing the same hash receive distinct indices, that original hashes
are preserved, and that deduplication still works for both values.
@simonswine simonswine force-pushed the 20260122_generic-tree branch from bb7c19c to 948c5fe Compare February 20, 2026 15:21
@simonswine
Copy link
Contributor Author

I really should have enabled the integration tests a longer time ago, basically the symbolizer isn't able to work correctly with this feature. I will have to spent some more time on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants