|
| 1 | +--- |
| 2 | +title: Performance & Tracing Update |
| 3 | +slug: 2025-02-03-performance-and-tracing |
| 4 | +authors: mgmeier |
| 5 | +tags: [performance-tracing] |
| 6 | +hide_table_of_contents: false |
| 7 | +--- |
| 8 | + |
| 9 | +## High level summary |
| 10 | + |
| 11 | +* Benchmarking: Release benchmarks and performance baselines on `10.2` for UTxO-HD, new GHC, Genesis; 'Perdiodic tracer' benchmarks. |
| 12 | +* Development: Pervasive thread labeling in the Node; fix a race condition in monitoring dependency `ekg-wai`. |
| 13 | +* Infrastructure: Haskell profile definition work passed testing, ready for merge; continued 'Byron' support in our tooling. |
| 14 | +* Tracing: C library for trace forwarding reached prototype stage; last batch of documentation updates ready for publication. |
| 15 | +* Community: Support and valuable feedback on Discord for new tracing system rollout. |
| 16 | + |
| 17 | +## Low level overview |
| 18 | + |
| 19 | + |
| 20 | +### Benchmarking |
| 21 | + |
| 22 | +We've performed a full set of release benchmarks and analyses for Node version `10.2`. We could not detect any performance risks, and expect network performance to be equivalent or slightly better |
| 23 | +than `10.1.x` releases, albeit using slightly more CPU resources under rare conditions. |
| 24 | + |
| 25 | +Furthermore, we're building several performance baselines with `10.2` to compare future changes, features or node flavours to. For comparative benchmarks, it's vital every change be measured individually, as to |
| 26 | +be able to discern their individual performance impact. For Node `10.3`, there are several of those we want to capture, such as crypto class simplifications in Ledger, UTxO-HD with a new in-memory backend, |
| 27 | +Ouroboros Genesis, and last not least a new GHC9.6 release addressing a remaining performance blocker when building Cardano. |
| 28 | + |
| 29 | +Additionally, we've validated the 'Periodic tracer' feature on cluster benchmarks and now have evidence of its positive impact on performance. This feature decorrelates gathering metrics from the ledger |
| 30 | +from the start of a block producer's forging loop, without sacrificing predictability of performance. By removing this competition on certain synchronization primitives, the hot code path in the forging |
| 31 | +loop now executes faster. The feature will be integrated in a future version of the Node. |
| 32 | + |
| 33 | +### Development |
| 34 | + |
| 35 | +We've tracked down a race condition in a community package that both tracing systems depend on for exposing metrics. In `ekg-wai`, a `ThreadKilled` exception could be re-thrown to the thread where |
| 36 | +it originated from. It is a low-risk condition, as it occurs only when then Node process terminates; however, when terminating due to an error condition, it caused the process to end prematurely, before the |
| 37 | +error could be logged. We've opened a [PR (ekg-wai#12)](https://github.com/tvh/ekg-wai/pull/12) against the package containing the fix and pre-released on CHaP. |
| 38 | + |
| 39 | +Tracking down this condition could have been improved by providing pervasive, human-readable labels for all the threads that the Node process spawns. So in coordination with the Consensus team, |
| 40 | +we made sure this is the case for future builds of the Node - including locations in the code where dependency packages internally use `forkIO` to create green threads. This will |
| 41 | +enhance usability of debug output when looking into concurrency issues. |
| 42 | + |
| 43 | +### Infrastructure |
| 44 | + |
| 45 | +The Haskell definition of benchmarking workloads - and the removal of its `bash`/`jq` counterpart - is complete, and has passed testing phase. This includes a final alignment between all profile content |
| 46 | +defined using either option. Once merged, this will open up the path for simplification of how `nix` interacts with the performance `workbench` - and hopefully reduce complexity for our CI runners. |
| 47 | + |
| 48 | +As `cardano-api` is deprecating some protocol parameter related data types which do not have relevance for Cardano anymore, we've had a discussion with stakeholders about the implications for our tooling: |
| 49 | +This would effectively disable our ability to benchmark clusters of BFT nodes which do not use a staking / reward-based consensus algorithm - as it used to be in Cardano's Byron era. The decision |
| 50 | +was made to not drop that ability from our tooling, as there are potential applications for the benchmarks outside of Cardano. As a consequence, we've startied porting those types to live on in our toolchain, |
| 51 | +representing an additonal maintenance item within our team. |
| 52 | + |
| 53 | + |
| 54 | +### Tracing |
| 55 | + |
| 56 | +The self-contained C library implementing trace forwarding is now in prototype state. It contains a pure C implementation of our forwarding protocol over socket, |
| 57 | +as well as pure C CBOR codecs for data payload to match the `TraceObject` schema used within the context Cardano. That ensures existing tooling can process traces emitted |
| 58 | +by non-Cardano applications, written in languages other than Haskell. |
| 59 | + |
| 60 | +The latest updates to [Developer Portal: `cardano-tracer`] are ready to be published and awaiting a PR review on the Cardano Developer Portal. |
| 61 | + |
| 62 | +### Community |
| 63 | + |
| 64 | +We've been quite busy on our new Discord channel [_#tracing-monitoring_](https://discord.com/channels/826816523368005654/1332375957528514590) on the *IOG's Technical Community* server. There's been |
| 65 | +an initial spike of interest and we've been able to provide support and explain various design decisions of the new tracing system. Additionally, we've gotten valuable feedback about potential |
| 66 | +features that would greatly help adoption of the new system. These are typically highly localized in their implementation, and non-breaking wrt. to API and design, such that addressing this |
| 67 | +feedback promptly adds much value at low risk - Thank You for your input! |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | +[Developer Portal: `cardano-tracer`]: https://developers.cardano.org/docs/get-started/cardano-node/new-tracing-system/cardano-tracer |
0 commit comments