You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a simple service that can handle 1.3M events per second and I wanted to add a counter to monitor this so I tried 2 approaches in my event loop;
adding packet_counter: Arc<AtomicU64>, to my struct and self.packet_counter.fetch_add(1, Ordering::Relaxed); in the loop with tokio::spawn(publish_metrics_task(self.packet_counter.clone(), self.interface.name, self.metric_resolution));
async fn publish_metrics_task(packet_counter: Arc<AtomicU64>, interface: InterfaceName, metric_resolution: MetricResolution) {
let mut interval = time::interval(metric_resolution.into());
let counter = metrics::counter!("xsk_rx", vec![]).increment(count);
loop {
interval.tick().await;
// Get the count and reset the counter atomically
let count = packet_counter.swap(0, Ordering::Relaxed);
// Publish to CloudWatch via metrics crate
counter.increment(1);
}
}
adding let counter = metrics::counter!("xsk_rx", vec![]); and counter.increment(1); in the loop
2 is able to keep up with the volume but 1 is a bottleneck at about 100K which is a 13X performance hit which makes it unusable in high volume services. I do realize however that this is more related to the metrics_cloudwatch backend. It seems to use channels to send every single data point.
Yeah the collection is rather naive right now, I'd like to change at least counter to just use atomics when I get some time but improving gauges and histograms require a bit more thought.
I have a simple service that can handle 1.3M events per second and I wanted to add a counter to monitor this so I tried 2 approaches in my event loop;
packet_counter: Arc<AtomicU64>,
to my struct andself.packet_counter.fetch_add(1, Ordering::Relaxed);
in the loop withtokio::spawn(publish_metrics_task(self.packet_counter.clone(), self.interface.name, self.metric_resolution));
let counter = metrics::counter!("xsk_rx", vec![]);
andcounter.increment(1);
in the loop2 is able to keep up with the volume but 1 is a bottleneck at about 100K which is a 13X performance hit which makes it unusable in high volume services. I do realize however that this is more related to the metrics_cloudwatch backend. It seems to use channels to send every single data point.
I was considering metrics-cloudwatch-embedded which actually uses the same approach as I do but histograms use channels also (ideally they would not): https://github.com/bmorin/metrics-cloudwatch-embedded/blob/c5a94f30b659e47a85b6a605ab285bd02398aa32/src/lib.rs#L27
The text was updated successfully, but these errors were encountered: