feat(kvevents): reference-count duplicate KV-event removals by Change72 · Pull Request #680 · llm-d/llm-d-kv-cache

Change72 · 2026-06-22T19:19:10Z

What

Adds an eventDedupFilter in pkg/kvevents that reference-counts
BlockStored/BlockRemoved announcements so a duplicate remove does not evict a
block that another announcement still references.

Why

vLLM's OffloadingConnector publishes self-describing, block-granular KV events.
In chunk mode (block_size > engine block size), when a shared prefix is not
aligned to the offloaded chunk size, two overlapping chunks legitimately list the
same constituent block hash, so the same hash is Stored and Removed more than
once on the wire.

Today BlockRemoved is forwarded straight to index.Evict per hash, so the
first of these duplicate removes evicts a block the sibling chunk still
references — premature eviction that under-credits the lower tier in routing.
There is currently no de-duplication anywhere in pkg/kvevents. This mirrors the
role of Dynamo's EventDedupFilter (ai-dynamo/dynamo#8012), which llm-d has no
equivalent for.

How

A small filter that mirrors the wire event stream (not index state):

every BlockStored increments the count for its hashes (duplicates included),
after the store has been applied to the index;
a BlockRemoved is forwarded to index.Evict only once its count returns to
zero; removes for never-seen hashes pass through defensively (an unknown evict
is a no-op) and a count never goes negative;
AllBlocksCleared resets a pod's counts in lockstep with the index's pod-wide
clear.

The reference-count scope is (podIdentifier, deviceTier, groupIdx), matching the
dimensions of the PodEntry identity that an eviction actually targets, so
gpu/cpu copies and distinct KV-cache groups of the same hash are counted
independently.

Data-parallel rank

The scope carries a dataParallelRank field that is fixed to a sentinel on
main, because the current index identity (PodEntry) is pod-level and does
not distinguish data-parallel ranks — so reference counts correctly aggregate
across ranks (a block is still resident on the pod until every rank has removed
it). This is intentionally not a change to DP semantics. Once a DP-aware
index lands (e.g. #370, which adds DataParallelRank to PodEntry and
EventBatch), the rank can be sourced from the event at the call site so the
dedup scope becomes DP-aware in lockstep, with no change to the filter itself (a
TODO(#370) marks the single line).

Scope of this change

No changes to EventBatch or the engine adapters — DP-rank propagation is left
to Add DP-aware routing support to KVEvents and indexing pipeline #370, which owns it.
Behavior is unchanged for the non-duplicate case (a single store + single
remove forwards exactly as before).

Observability

Because a suppressed BlockRemoved is otherwise invisible, two counters are added
to pkg/kvcache/metrics following the existing global-collector pattern (counts
are at block-hash granularity, not BlockRemoved events):

kvcache_kvevents_dedup_removed_hashes_suppressed_total
kvcache_kvevents_dedup_removed_hashes_forwarded_total

A TRACE log is also emitted when removals are suppressed. A gauge for the
outstanding per-pod reference-count map size is a planned follow-up.

Testing

go build ./...
go vet ./pkg/kvevents/...
golangci-lint run ./pkg/kvevents/...                          # 0 issues
go test -race ./pkg/kvevents/... ./pkg/kvcache/kvblock/...     # all pass

New tests in pkg/kvevents/event_dedup_filter_test.go:

Filter unit tests: duplicate store → first remove suppressed / second forwards;
aggregation across sources; gpu/cpu tier independence; KV-cache group
independence; defensive unknown-remove pass-through; partial forward; clear
resets and isolates pods; nil-safe.
Pool-level (pool_test.go, processEventBatch + real InMemoryIndex):
duplicate store survives the first remove and is evicted by the second;
CPU/offload duplicate removal through the empty-token device-tier path (the
GPU copy stays untouched); a device-tier update that resolves no keys is not
reference-counted; AllBlocksCleared resets the filter.

AI assistance

Developed with AI assistance. The submitter has reviewed every changed line, ran
the tests, and owns the change.

vLLM's OffloadingConnector publishes self-describing block-granular KV events. In chunk mode, overlapping offloaded chunks legitimately re-announce the same constituent block hash, so a hash is Stored and Removed more than once on the wire. BlockRemoved was forwarded straight to index.Evict per hash, so the first such duplicate remove evicted a block a sibling chunk still referenced (premature eviction that under-credits the lower tier in routing). Add an eventDedupFilter in pkg/kvevents that mirrors the wire stream: every BlockStored increments a per-(pod, device tier, KV-cache group) reference count, a BlockRemoved is forwarded to the index only once that count returns to zero, unknown removes pass through defensively, and AllBlocksCleared resets a pod. This matches the dimensions of the PodEntry identity an eviction targets, so gpu/cpu copies and distinct groups of a hash are counted independently. The scope's data-parallel rank is a sentinel on current main because the index identity is pod-level and does not distinguish ranks, so counts aggregate across ranks; a future DP-aware index (llm-d#370) can wire the real rank with no change to the filter. No changes to EventBatch or the engine adapters. Signed-off-by: Change72 <changg@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a small reference-counting de-duplication layer to KV event processing so that duplicate BlockRemoved events (legitimately produced by vLLM chunk-mode offloading for overlapping chunks) do not prematurely evict blocks that are still referenced by another outstanding announcement.

Changes:

Introduces an eventDedupFilter that reference-counts (pod, deviceTier, groupIdx, dpRank-sentinel, blockHash) across BlockStored / BlockRemoved.
Wires the filter into Pool.processEventBatch: stores increment after successful index add; removes are filtered so eviction is forwarded only when the refcount reaches zero; AllBlocksCleared resets counts for the pod.
Adds focused unit + pool-level integration tests covering duplicate store/remove behavior, tier/group isolation, defensive unknown removes, and clear/reset behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
pkg/kvevents/pool.go	Normalizes device tier consistently, tracks stores/removes through the new dedup filter, and clears filter state on `AllBlocksCleared`.
pkg/kvevents/event_dedup_filter.go	Implements the reference-counting filter keyed by pod/tier/group/dp-rank-sentinel/hash with defensive pass-through behavior.
pkg/kvevents/event_dedup_filter_test.go	Adds unit + end-to-end tests validating duplicate-remove suppression, isolation across tiers/groups, and clear/reset behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Change72 <changg@nvidia.com>

Address review feedback on the KV-event dedup filter: - Add kvcache_kvevents_dedup_removed_hashes_{suppressed,forwarded}_total counters (block-hash granularity, not event count) so suppressed removals are observable, following the pkg/kvcache/metrics global-collector pattern; emit a TRACE log when removals are suppressed. - Document why the filter lives in the Pool rather than as an Index decorator. - Tighten the concurrency doc: the mutex guards the cross-pod map and same-pod events are serialized by Pool.AddTask sharding. - Move the pool-level tests into pool_test.go and add a negative test that a device-tier update resolving no keys is not reference-counted. Signed-off-by: Change72 <changg@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The test relocation in the previous commit dropped TestPool_AllBlocksClearedResetsDedup instead of moving it into pool_test.go. Restore it next to TestAllBlocksCleared_Dispatch: it is the only regression guard that the AllBlocksCleared handler resets the dedup refcount (Dispatch only proves Index.Clear ran), so a post-clear store/remove cycle is not wrongly suppressed. Signed-off-by: Change72 <changg@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…guard Address second-round review feedback: - Add DedupRemovedHashes{Suppressed,Forwarded} to TestCollectorsIncludesAllMetrics so the registration-completeness guard fails if either counter is ever dropped from Collectors() (previously only the 9 pre-existing collectors were guarded). - Add TestPool_DedupMetricsCountBlockHashes verifying the counter arithmetic at block-hash granularity: two duplicate stores then two removes record 4 suppressed (first remove) and 4 forwarded (second). Values are read via the dto.Metric Write pattern already used by metrics.logMetrics, so no new dependency is introduced. - Document that the noGroupIdx (-1) sentinel cannot collide with a real group index, since the engine adapters reject a negative group_idx. Signed-off-by: Change72 <changg@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yankay · 2026-06-24T12:40:16Z

Thanks, this looks consistent with Dynamo’s dedup behavior and the current llm-d index semantics. The tier/group/offload coverage looks good to me.

/lgtm

vMaroon · 2026-06-25T17:50:21Z

/approve

Copilot AI review requested due to automatic review settings June 22, 2026 19:19

Change72 requested review from dannyharnik, kfirtoledo, liu-cong and vMaroon as code owners June 22, 2026 19:19

github-actions Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 22, 2026

github-actions Bot requested review from hyeongyun0916, sagearc and yankay June 22, 2026 19:19

Copilot started reviewing on behalf of Change72 June 22, 2026 19:19 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Change72 and others added 4 commits June 22, 2026 19:30

docs(kvevents): update pool state comment

e6009b3

Signed-off-by: Change72 <changg@nvidia.com>

github-actions Bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Jun 24, 2026

github-actions Bot approved these changes Jun 25, 2026

View reviewed changes

github-actions Bot merged commit 7965f06 into llm-d:main Jun 25, 2026
11 checks passed

Change72 deleted the feat/kvevents-remove-refcount-dedup branch June 25, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(kvevents): reference-count duplicate KV-event removals#680

feat(kvevents): reference-count duplicate KV-event removals#680
github-actions[bot] merged 5 commits into
llm-d:mainfrom
Change72:feat/kvevents-remove-refcount-dedup

Change72 commented Jun 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

yankay commented Jun 24, 2026

Uh oh!

vMaroon commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Change72 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Data-parallel rank

Scope of this change

Observability

Testing

AI assistance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

yankay commented Jun 24, 2026

Uh oh!

vMaroon commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Change72 commented Jun 22, 2026 •

edited

Loading