You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the holesky fork incident, we did not have much visibility into what chain each beacon node was on. We can improve this observability.
We also hit the BN with many requests every slot, we could potentially change charon's behaviour to be more efficient by leveraging the beacon node server sent events feature.
🛠️ Proposed solution
I believe we can progress both of these problems, with a low impact feature which listens to BN events and debug logs + monitors them.
First we should create a design doc, outlining exactly what events we will listen to, what we log, and what of them we expose to monitoring and why. We should consider scraping every type of BN for the standard-ish metrics in every config charon is deployed in, and we should consider a charon-based client, that either outbounds requests to BNs, or subscribes to the SSE endpoint.
We Register a handler to the BN SSE endpoints.
When events come in. Debug log them, and update prom gauges as appropriate.
Create grafana panels that display this info.
Approved design doc: link
Core team consensus on the proposed solution
🧪 Tests
Tested by new automated unit/integration/smoke tests
Manually tested on core team/canary/test clusters
Manually tested on local compose simnet
👐 Additional acceptance criteria
We can plan a future feature where we share something like the parentRoot observed on peerInfo, such that we can warn if it looks like peers are on different forks (we can spot this on our central prom before shipping this).
❌ Out of Scope
Making changes to scheduler or triggering retries etc if we detect a re-org. (Though we should plan to make optimisations here).
Pushing this code into an eth2-client library/package (though this would be a good candidate for something to abstract into a package early in the development of one)
The text was updated successfully, but these errors were encountered:
🎯 Problem to be solved
During the holesky fork incident, we did not have much visibility into what chain each beacon node was on. We can improve this observability.
We also hit the BN with many requests every slot, we could potentially change charon's behaviour to be more efficient by leveraging the beacon node server sent events feature.
🛠️ Proposed solution
I believe we can progress both of these problems, with a low impact feature which listens to BN events and debug logs + monitors them.
First we should create a design doc, outlining exactly what events we will listen to, what we log, and what of them we expose to monitoring and why. We should consider scraping every type of BN for the standard-ish metrics in every config charon is deployed in, and we should consider a charon-based client, that either outbounds requests to BNs, or subscribes to the SSE endpoint.
We Register a handler to the BN SSE endpoints.
When events come in. Debug log them, and update prom gauges as appropriate.
Create grafana panels that display this info.
🧪 Tests
👐 Additional acceptance criteria
We can plan a future feature where we share something like the parentRoot observed on peerInfo, such that we can warn if it looks like peers are on different forks (we can spot this on our central prom before shipping this).
❌ Out of Scope
Making changes to scheduler or triggering retries etc if we detect a re-org. (Though we should plan to make optimisations here).
Pushing this code into an eth2-client library/package (though this would be a good candidate for something to abstract into a package early in the development of one)
The text was updated successfully, but these errors were encountered: