Skip to content

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587

Open
leipanhz wants to merge 1 commit into
llm-d:mainfrom
leipanhz:feat/sha256-cbor-hashing
Open

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587
leipanhz wants to merge 1 commit into
llm-d:mainfrom
leipanhz:feat/sha256-cbor-hashing

Conversation

@leipanhz

@leipanhz leipanhz commented May 15, 2026

Copy link
Copy Markdown

Add KV-cache file prefetch plugin for inference requests (experimental feature)
Part 1: changes in KV-Cache (current PR)
Part 2: changes in llm-d-router

PR Description:
Introduces a new experimental feature that aims to proactively prefetch KV-cache blocks across different storage tiers before inference requests are processed by the GPU pod. The plugin extends the precise prefix cache scorer with engine key calculation to determine the storage location (file names) of KV-cache blocks that will be needed and arrange for them to be promoted to a closer storage tier to improve inference latency. The current implementation is intended for a shared file system that includes transparent access to a remote storage tier, such as IBM Storage Scale configured to off-load cold data to remote object storage. The prefetch plugin uses a concurrent worker thread pool architecture to efficiently prefetch multiple (configurable) files in parallel from remote storage to the shared file system. In a future version of the plugin this could be extended, for example, to prefetch KV-cache blocks from the file system to CPU memory on the worker node that the request is being routed to.

For this to work correctly, the plugin must be configured to use a hash algorithm for generating engine keys that matches the algorithm used by vLLM when offloading KV-cache blocks to storage. For this purpose, this work adds a configurable hashing algorithm SHA256-CBOR to the token processor as an alternative for vLLM compatibility. The SHA256-CBOR implementation supports extra keys (multimodal features) in block hash computation. In addition, this feature relies on logic derived from the llm-d-fs-connector to generate KV file names, so it currently only works with the llm-d-fs-connector.

Changes include:

New Prefetch Plugin (prefetch_prerequest_experimental.go):

  • Implements PreRequest interface for pre-inference file prefetching
  • Converts engine keys to filesystem paths using llm-d-fs-connector format
  • Manages worker thread pool for concurrent file prefetching (configurable workers)
  • Each worker reads configurable number of blocks (BlockSize x BlockCount bytes) from KV-cache files to trigger prefetch of the rest of the file from remote storage.
  • Supports configurable prefetch parameters (block size, concurrency, queue size)

Precise Prefix Cache Scorer Enhancement (precise_prefix_cache.go):

  • Add GetEngineKeysForRequest() method to extract engine keys from requests
  • Support multimodal features in engine key computation

Add SHA256-CBOR hashing algorithm for token processor with extra keys support

  • Add configuration to choose hashing function via the field name “hashAlgorithm”: FNV64a default, SHA256-CBOR for vLLM
  • Implement SHA256-CBOR hashing matching vLLM engine-key computation
  • Extend BlockExtraFeatures for multimodal content support

@github-actions github-actions Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2026
@github-actions

Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch 6 times, most recently from 93bed83 to 461c512 Compare May 19, 2026 23:17
@github-actions

Copy link
Copy Markdown

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch from 461c512 to 8eecb66 Compare June 10, 2026 02:12
@leipanhz

Copy link
Copy Markdown
Author

Related issue: llm-d/llm-d-router#866

… support

Add configurable hashing algorithm in token processor with FNV64a
as the default and SHA256-CBOR as an alternative for vLLM compatibility.
Add support for extra keys (multimodal features) in block hash computation.
Include comprehensive unit tests for SHA256 hashing and extra keys functionality.

Add TokensToKVBlockKeysWithDigests to TokenProcessor so callers can access the
underlying hash digests alongside the truncated uint64 BlockHash values.

Changes:
- Add HashAlgorithm configuration (FNV64a default, SHA256-CBOR for vLLM)
- Implement SHA256-CBOR hashing matching vLLM engine-key computation
- Extend BlockExtraFeatures for multimodal content support
- Set FNV as the default hashing algorithm to convert tokens to block keys
- Expose full-width hash digests for KV-cache filename construction

Signed-off-by: Lei Pan <leipan@ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch from 8eecb66 to ff2c47e Compare June 20, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant