Skip to content

Fix LagunaBackend ABI mismatch from KvFlashPager#477

Draft
davide221 wants to merge 1 commit into
codex/ddtree-ggml-graph-optfrom
codex/fix-laguna-kvflash-abi
Draft

Fix LagunaBackend ABI mismatch from KvFlashPager#477
davide221 wants to merge 1 commit into
codex/ddtree-ggml-graph-optfrom
codex/fix-laguna-kvflash-abi

Conversation

@davide221

@davide221 davide221 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • keep KvFlashPager layout stable across CUDA and non-CUDA translation units by storing the CUDA stream handle as an always-present void *
  • detach pager and clear hybrid/routing/KVFlash state before freeing Laguna target/backend resources
  • make CPU embedder mmap cleanup idempotent and explicitly clear Laguna embedder ownership during target free

Verification

  • env CCACHE_DISABLE=1 cmake --build /home/lucebox/lucebox-hub-main-clean/server/build-main86-cuda13 --target bench_laguna_spark dflash_server -j4
  • DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 1 1 --max-ctx 64 --kv q8_0
  • DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 16 8 --max-ctx 128 --kv q8_0
  • DFLASH_LAGUNA_NO_SINGLE_GRAPH=1 DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 1 1 --max-ctx 64 --kv q8_0
  • dflash_server chat-completions smoke on 127.0.0.1:18080

Review in cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant