Skip to content

Conversation

@lucca30
Copy link

@lucca30 lucca30 commented Nov 5, 2025

After our first simulations we saw a big challenge on Pruning and Compaction the node's data while keeping it online.

As we can see on the image bellow which follows CPU and memory. We enabled compaction and prune at the second start on a simulation which led to a memory spike of 50GB and also shows that after that the consumption still higher than usual scenario.

image

So the consequences were:

  • Memory Spike
  • High I/O
  • Node dropping and being kicked from the p2p

After some research we realized the reason was because:

  • Heavy memory allocations on TxIndexer prune in big intervals of pruning
    • It was them refactored to avoid multiple iterator loops and restricted to prefix
  • .Compact(nil,nil) which runs through the whole db when it should be split in small prefix ranges
    • It demanded a series of new implementations. Mainly the new 3 functions CompactSharded256, CompactPrefixHex256 and CompactIntSharded
  • Read operations filling cache when reading the data to delete
  • batch.Write() instead of batch.WriteSync() and some sleep. Which does not let the fls flush from L0 to lower levels
  • And some others nits like:
    • Avoiding allocation of unnecessary vars
    • Stack of defers in loop
    • Loops starting at 1 instead of initialHeight

With data we reached a solution where no spikes are found while compacting 4.7 million of blocks in a heimdall mainnet node as we can see in the image below.

image

PS.: The only spike we can see is the one at start, which is usual and happens both with pruning enable or not
PS2.: Note that the memory usage still almost the same from heimdall.

And finally here's is it how it looks like a pruning and deletion on a heimdall mainnet node:

Before:
417G    /var/lib/heimdall/data/blockstore.db
124G    /var/lib/heimdall/data/tx_index.db

After:
149G    /var/lib/heimdall/data/blockstore.db (~64% reduction)
45G     /var/lib/heimdall/data/tx_index.db (~63% reduction)

PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments

@lucca30 lucca30 requested review from avalkov and marcello33 November 5, 2025 18:46
@marcello33
Copy link

LGTM but I'd have a look at the tests
Also, is this the only change to the cometdb repo? If that's finalized, maybe we can raise a PR to main and release a version (although I see the tag is created already)

@lucca30
Copy link
Author

lucca30 commented Nov 6, 2025

Thanks @marcello33

Yes, the changes are these ones. I already launched the tag since it was a minor change so I assumed this review to cover that one too.

Some of the tests were failing before but now I see some news were failing. I'll check it today

@sonarqubecloud
Copy link

@lucca30 lucca30 merged commit 001c154 into develop Nov 10, 2025
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants