Skip to content

feat(kvevents): reference-count duplicate KV-event removals#680

Merged
github-actions[bot] merged 5 commits into
llm-d:mainfrom
Change72:feat/kvevents-remove-refcount-dedup
Jun 25, 2026
Merged

feat(kvevents): reference-count duplicate KV-event removals#680
github-actions[bot] merged 5 commits into
llm-d:mainfrom
Change72:feat/kvevents-remove-refcount-dedup

Conversation

@Change72

@Change72 Change72 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What

Adds an eventDedupFilter in pkg/kvevents that reference-counts
BlockStored/BlockRemoved announcements so a duplicate remove does not evict a
block that another announcement still references.

Why

vLLM's OffloadingConnector publishes self-describing, block-granular KV events.
In chunk mode (block_size > engine block size), when a shared prefix is not
aligned to the offloaded chunk size, two overlapping chunks legitimately list the
same constituent block hash, so the same hash is Stored and Removed more than
once on the wire.

Today BlockRemoved is forwarded straight to index.Evict per hash, so the
first of these duplicate removes evicts a block the sibling chunk still
references — premature eviction that under-credits the lower tier in routing.
There is currently no de-duplication anywhere in pkg/kvevents. This mirrors the
role of Dynamo's EventDedupFilter (ai-dynamo/dynamo#8012), which llm-d has no
equivalent for.

How

A small filter that mirrors the wire event stream (not index state):

  • every BlockStored increments the count for its hashes (duplicates included),
    after the store has been applied to the index;
  • a BlockRemoved is forwarded to index.Evict only once its count returns to
    zero; removes for never-seen hashes pass through defensively (an unknown evict
    is a no-op) and a count never goes negative;
  • AllBlocksCleared resets a pod's counts in lockstep with the index's pod-wide
    clear.

The reference-count scope is (podIdentifier, deviceTier, groupIdx), matching the
dimensions of the PodEntry identity that an eviction actually targets, so
gpu/cpu copies and distinct KV-cache groups of the same hash are counted
independently.

Data-parallel rank

The scope carries a dataParallelRank field that is fixed to a sentinel on
main, because the current index identity (PodEntry) is pod-level and does
not distinguish data-parallel ranks — so reference counts correctly aggregate
across ranks (a block is still resident on the pod until every rank has removed
it). This is intentionally not a change to DP semantics. Once a DP-aware
index lands (e.g. #370, which adds DataParallelRank to PodEntry and
EventBatch), the rank can be sourced from the event at the call site so the
dedup scope becomes DP-aware in lockstep, with no change to the filter itself (a
TODO(#370) marks the single line).

Scope of this change

Observability

Because a suppressed BlockRemoved is otherwise invisible, two counters are added
to pkg/kvcache/metrics following the existing global-collector pattern (counts
are at block-hash granularity, not BlockRemoved events):

  • kvcache_kvevents_dedup_removed_hashes_suppressed_total
  • kvcache_kvevents_dedup_removed_hashes_forwarded_total

A TRACE log is also emitted when removals are suppressed. A gauge for the
outstanding per-pod reference-count map size is a planned follow-up.

Testing

go build ./...
go vet ./pkg/kvevents/...
golangci-lint run ./pkg/kvevents/...                          # 0 issues
go test -race ./pkg/kvevents/... ./pkg/kvcache/kvblock/...     # all pass

New tests in pkg/kvevents/event_dedup_filter_test.go:

  • Filter unit tests: duplicate store → first remove suppressed / second forwards;
    aggregation across sources; gpu/cpu tier independence; KV-cache group
    independence; defensive unknown-remove pass-through; partial forward; clear
    resets and isolates pods; nil-safe.
  • Pool-level (pool_test.go, processEventBatch + real InMemoryIndex):
    duplicate store survives the first remove and is evicted by the second;
    CPU/offload duplicate removal through the empty-token device-tier path (the
    GPU copy stays untouched); a device-tier update that resolves no keys is not
    reference-counted; AllBlocksCleared resets the filter.

AI assistance

Developed with AI assistance. The submitter has reviewed every changed line, ran
the tests, and owns the change.

vLLM's OffloadingConnector publishes self-describing block-granular KV events.
In chunk mode, overlapping offloaded chunks legitimately re-announce the same
constituent block hash, so a hash is Stored and Removed more than once on the
wire. BlockRemoved was forwarded straight to index.Evict per hash, so the first
such duplicate remove evicted a block a sibling chunk still referenced
(premature eviction that under-credits the lower tier in routing).

Add an eventDedupFilter in pkg/kvevents that mirrors the wire stream: every
BlockStored increments a per-(pod, device tier, KV-cache group) reference count,
a BlockRemoved is forwarded to the index only once that count returns to zero,
unknown removes pass through defensively, and AllBlocksCleared resets a pod.
This matches the dimensions of the PodEntry identity an eviction targets, so
gpu/cpu copies and distinct groups of a hash are counted independently.

The scope's data-parallel rank is a sentinel on current main because the index
identity is pod-level and does not distinguish ranks, so counts aggregate across
ranks; a future DP-aware index (llm-d#370) can wire the real rank with no change to
the filter. No changes to EventBatch or the engine adapters.

Signed-off-by: Change72 <changg@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 22, 2026 19:19
@github-actions github-actions Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a small reference-counting de-duplication layer to KV event processing so that duplicate BlockRemoved events (legitimately produced by vLLM chunk-mode offloading for overlapping chunks) do not prematurely evict blocks that are still referenced by another outstanding announcement.

Changes:

  • Introduces an eventDedupFilter that reference-counts (pod, deviceTier, groupIdx, dpRank-sentinel, blockHash) across BlockStored / BlockRemoved.
  • Wires the filter into Pool.processEventBatch: stores increment after successful index add; removes are filtered so eviction is forwarded only when the refcount reaches zero; AllBlocksCleared resets counts for the pod.
  • Adds focused unit + pool-level integration tests covering duplicate store/remove behavior, tier/group isolation, defensive unknown removes, and clear/reset behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/kvevents/pool.go Normalizes device tier consistently, tracks stores/removes through the new dedup filter, and clears filter state on AllBlocksCleared.
pkg/kvevents/event_dedup_filter.go Implements the reference-counting filter keyed by pod/tier/group/dp-rank-sentinel/hash with defensive pass-through behavior.
pkg/kvevents/event_dedup_filter_test.go Adds unit + end-to-end tests validating duplicate-remove suppression, isolation across tiers/groups, and clear/reset behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Change72 and others added 4 commits June 22, 2026 19:30
Signed-off-by: Change72 <changg@nvidia.com>
Address review feedback on the KV-event dedup filter:
- Add kvcache_kvevents_dedup_removed_hashes_{suppressed,forwarded}_total
  counters (block-hash granularity, not event count) so suppressed removals are
  observable, following the pkg/kvcache/metrics global-collector pattern; emit a
  TRACE log when removals are suppressed.
- Document why the filter lives in the Pool rather than as an Index decorator.
- Tighten the concurrency doc: the mutex guards the cross-pod map and same-pod
  events are serialized by Pool.AddTask sharding.
- Move the pool-level tests into pool_test.go and add a negative test that a
  device-tier update resolving no keys is not reference-counted.

Signed-off-by: Change72 <changg@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test relocation in the previous commit dropped
TestPool_AllBlocksClearedResetsDedup instead of moving it into pool_test.go.
Restore it next to TestAllBlocksCleared_Dispatch: it is the only regression
guard that the AllBlocksCleared handler resets the dedup refcount (Dispatch
only proves Index.Clear ran), so a post-clear store/remove cycle is not wrongly
suppressed.

Signed-off-by: Change72 <changg@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guard

Address second-round review feedback:
- Add DedupRemovedHashes{Suppressed,Forwarded} to TestCollectorsIncludesAllMetrics
  so the registration-completeness guard fails if either counter is ever dropped
  from Collectors() (previously only the 9 pre-existing collectors were guarded).
- Add TestPool_DedupMetricsCountBlockHashes verifying the counter arithmetic at
  block-hash granularity: two duplicate stores then two removes record 4
  suppressed (first remove) and 4 forwarded (second). Values are read via the
  dto.Metric Write pattern already used by metrics.logMetrics, so no new
  dependency is introduced.
- Document that the noGroupIdx (-1) sentinel cannot collide with a real group
  index, since the engine adapters reject a negative group_idx.

Signed-off-by: Change72 <changg@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yankay

yankay commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Thanks, this looks consistent with Dynamo’s dedup behavior and the current llm-d index semantics. The tier/group/offload coverage looks good to me.

/lgtm

@github-actions github-actions Bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Jun 24, 2026
@vMaroon

vMaroon commented Jun 25, 2026

Copy link
Copy Markdown
Member

/approve

@github-actions github-actions Bot merged commit 7965f06 into llm-d:main Jun 25, 2026
11 checks passed
@Change72 Change72 deleted the feat/kvevents-remove-refcount-dedup branch June 25, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants