Fix LagunaBackend ABI mismatch from KvFlashPager by davide221 · Pull Request #477 · Luce-Org/lucebox-hub

davide221 · 2026-07-01T12:43:36Z

Summary

keep KvFlashPager layout stable across CUDA and non-CUDA translation units by storing the CUDA stream handle as an always-present void *
detach pager and clear hybrid/routing/KVFlash state before freeing Laguna target/backend resources
make CPU embedder mmap cleanup idempotent and explicitly clear Laguna embedder ownership during target free

env CCACHE_DISABLE=1 cmake --build /home/lucebox/lucebox-hub-main-clean/server/build-main86-cuda13 --target bench_laguna_spark dflash_server -j4
DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 1 1 --max-ctx 64 --kv q8_0
DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 16 8 --max-ctx 128 --kv q8_0
DFLASH_LAGUNA_NO_SINGLE_GRAPH=1 DFLASH_EXPERT_BUDGET_PCT=60 bench_laguna_spark ... 1 1 --max-ctx 64 --kv q8_0
dflash_server chat-completions smoke on 127.0.0.1:18080

Fix LagunaBackend ABI mismatch from KvFlashPager

9e87cb2