Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listen to the beacon node SSE endpoint and expose via prom metrics #3580

Open
5 tasks
OisinKyne opened this issue Mar 10, 2025 · 0 comments
Open
5 tasks

Listen to the beacon node SSE endpoint and expose via prom metrics #3580

OisinKyne opened this issue Mar 10, 2025 · 0 comments
Labels
needs refining Solution is unclear and needs refining protocol Protocol Team tickets
Milestone

Comments

@OisinKyne
Copy link
Contributor

OisinKyne commented Mar 10, 2025

🎯 Problem to be solved

During the holesky fork incident, we did not have much visibility into what chain each beacon node was on. We can improve this observability.

We also hit the BN with many requests every slot, we could potentially change charon's behaviour to be more efficient by leveraging the beacon node server sent events feature.

🛠️ Proposed solution

I believe we can progress both of these problems, with a low impact feature which listens to BN events and debug logs + monitors them.

First we should create a design doc, outlining exactly what events we will listen to, what we log, and what of them we expose to monitoring and why. We should consider scraping every type of BN for the standard-ish metrics in every config charon is deployed in, and we should consider a charon-based client, that either outbounds requests to BNs, or subscribes to the SSE endpoint.

We Register a handler to the BN SSE endpoints.

When events come in. Debug log them, and update prom gauges as appropriate.

Create grafana panels that display this info.

  • Approved design doc: link
  • Core team consensus on the proposed solution

🧪 Tests

  • Tested by new automated unit/integration/smoke tests
  • Manually tested on core team/canary/test clusters
  • Manually tested on local compose simnet

👐 Additional acceptance criteria

We can plan a future feature where we share something like the parentRoot observed on peerInfo, such that we can warn if it looks like peers are on different forks (we can spot this on our central prom before shipping this).

❌ Out of Scope

Making changes to scheduler or triggering retries etc if we detect a re-org. (Though we should plan to make optimisations here).

Pushing this code into an eth2-client library/package (though this would be a good candidate for something to abstract into a package early in the development of one)

@OisinKyne OisinKyne added this to the v1.4.0 milestone Mar 10, 2025
@github-actions github-actions bot added the protocol Protocol Team tickets label Mar 10, 2025
@KaloyanTanev KaloyanTanev added the needs refining Solution is unclear and needs refining label Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs refining Solution is unclear and needs refining protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

2 participants