reporting: Continuously send full batches of events #1226

sharnoff · 2025-01-29T01:34:27Z

Part of #1220.

In cases where (a) we expect batches to be very large, and (b) we expect that some autoscaler-agent instances may end up with many batches of events between reporting periods, we might see excessive memory usage from those events.

So instead of pushing events exactly every push period, we should push at least every push period, and exactly when the next batch is full if it's full before the push period is reached.

Implementation

The new implementation replaces the event queue between reporting.EventSink and the event sender threads with a queue-like construct that collects events into discrete batches. This new construct is eventBatcher, in the new batcher.go file.

Then, in all of the places where the event sender previously dealt with collecting chunks of events, it instead operates on a single "batch" of events at a time.

The gauge metric showing the number of events in the queue still works in the same way it did before, counting both events in batches that haven't been finalized, and finalized batches that haven't been sent yet.

~~Note: This PR builds on #1221, and must not be merged before it.~~

github-actions · 2025-01-29T01:38:18Z

No changes to the coverage.

HTML Report

Click to open

Part of #1220. In cases where (a) we expect batches to be very large, and (b) we expect that some autoscaler-agent instances may end up with many batches of events between reporting periods, we might see excessive memory usage from those events. So instead of pushing events exactly every push period, we should push *at least* every push period, and exactly when the next batch is full if it's full before the push period is reached.

mikhail-sakhnov · 2025-02-20T12:46:03Z

pkg/reporting/batcher_test.go

+
+	batcher := newEventBatcher[string](targetBatchSize, notify, gauge)
+
+	// First batch:


suggestion: do those batches rely on each other? if no, I suggest to wrap them in t.Run calls, it would be a bit nicer to read in the unittest run report

mikhail-sakhnov · 2025-02-20T12:49:56Z

pkg/reporting/send.go

@@ -37,8 +41,10 @@ type eventSender[E any] struct {
 }

 func (s eventSender[E]) senderLoop(ctx context.Context, logger *zap.Logger) {
-	ticker := time.NewTicker(time.Second * time.Duration(s.client.BaseConfig.PushEverySeconds))
-	defer ticker.Stop()
+	heartbeat := time.Second * time.Duration(s.client.BaseConfig.PushEverySeconds)


nitpick (out of context): BaseConfig could have those fields defined as time.Duration which supports json marshaling

IIUC time.Duration doesn't have fancy JSON marshaling (it's just an int64 of nanoseconds), unless I'm missing something?

mikhail-sakhnov · 2025-02-20T12:51:22Z

pkg/reporting/send.go

+			// finish up any in-progress batch, so that we can send it before we exit.
+			s.queue.finishOngoing()
+		case <-s.batchComplete.Wait():
+			s.batchComplete.Awake() // consume this notification


question: why is that needed?

sidenote, nitpick: comment is a great example of "what we do" statement while it would be more useful to have "why we do" statement

mikhail-sakhnov · 2025-02-20T12:54:14Z

pkg/reporting/send.go

-	defer ticker.Stop()
+	heartbeat := time.Second * time.Duration(s.client.BaseConfig.PushEverySeconds)
+
+	timer := time.NewTimer(heartbeat)


question, suggestion: Since we are effectively have a loop here, should it be ticker instead of timer?

mikhail-sakhnov · 2025-02-20T12:56:10Z

pkg/reporting/batcher.go

+	defer b.mu.Unlock()
+
+	tmp := make([]batch[E], len(b.completed))
+	copy(tmp, b.completed)


question: why do we copy slice every time we peek into it?

mikhail-sakhnov · 2025-02-20T13:05:19Z

pkg/reporting/send.go

-		if size := s.queue.size(); size != 0 {
-			logger.Info("Current queue size is non-zero", zap.Int("queueSize", size))
-		}
+		batches := s.queue.peekCompleted()


question: It looks like we never use batches as a collection of batches, rather sticking with the first completed batch only. Should the queue API be changed to something like "peekLatestCompleted" and "peekCompletedCount"?

mikhail-sakhnov

looks legit, left few questions, none of them I consider as a blocker for merging.

This was referenced Jan 29, 2025

reporting: Serialize batches as they're constructed #1227

Open

Epic: Scaling event reporting follow-ups #1220

Open

Omrigan self-requested a review February 10, 2025 17:30

Base automatically changed from sharnoff/reporting-clean-shutdown to main February 10, 2025 23:57

sharnoff force-pushed the sharnoff/reporting-batching branch from 95a12a0 to b6d3c65 Compare February 13, 2025 11:57

mikhail-sakhnov self-requested a review February 13, 2025 14:30

mikhail-sakhnov self-assigned this Feb 13, 2025

mikhail-sakhnov reviewed Feb 20, 2025

View reviewed changes

mikhail-sakhnov approved these changes Feb 20, 2025

View reviewed changes

mikhail-sakhnov assigned sharnoff and unassigned mikhail-sakhnov Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reporting: Continuously send full batches of events #1226

reporting: Continuously send full batches of events #1226

sharnoff commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025 •

edited

Loading

mikhail-sakhnov Feb 20, 2025 •

edited

Loading

mikhail-sakhnov Feb 20, 2025 •

edited

Loading

sharnoff Feb 25, 2025

mikhail-sakhnov Feb 20, 2025

mikhail-sakhnov Feb 20, 2025

mikhail-sakhnov Feb 20, 2025

mikhail-sakhnov Feb 20, 2025

mikhail-sakhnov left a comment


		batcher := newEventBatcher[string](targetBatchSize, notify, gauge)

		// First batch:

reporting: Continuously send full batches of events #1226

Are you sure you want to change the base?

reporting: Continuously send full batches of events #1226

Conversation

sharnoff commented Jan 29, 2025 • edited Loading

Implementation

github-actions bot commented Jan 29, 2025 • edited Loading

HTML Report

mikhail-sakhnov Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

mikhail-sakhnov Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

sharnoff Feb 25, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Feb 20, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Feb 20, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Feb 20, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Feb 20, 2025

Choose a reason for hiding this comment

mikhail-sakhnov left a comment

Choose a reason for hiding this comment

sharnoff commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025 •

edited

Loading

mikhail-sakhnov Feb 20, 2025 •

edited

Loading

mikhail-sakhnov Feb 20, 2025 •

edited

Loading