Skip to content

fim : support n_cmpl#125

Open
ggerganov wants to merge 3 commits into
masterfrom
gg/n-cmpl
Open

fim : support n_cmpl#125
ggerganov wants to merge 3 commits into
masterfrom
gg/n-cmpl

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented May 12, 2026

Overview

Support generating multiple completions at a time using the n_cmpl parameter:

ggml-org/llama.cpp#17775

Approach

Cache as ring buffer per key

Each cache key (hash of prefix/middle/suffix) now maps to a ring buffer of up to n_cmpl individual completion responses. When a new completion arrives for an existing key, it is appended; if the ring is full, the oldest entry is evicted. Completions with duplicate content are skipped on insert.

This naturally accumulates diverse completions across multiple requests for the same position, without packing them into a single array value.

Simplified fim_try_hint

Two clear phases:

  1. Exact match — look up the current position hash. If completions exist, render them and enable cycling (<C-J>/<C-K>).
  2. Nearby match (only if Phase 1 yields nothing) — scan 128 characters back for a cached completion whose start matches what was typed. Pick the single best match, render without cycling.

New config options

  • n_cmpl (default: 1) — max completions per position in the ring buffer
  • keymap_fim_next / keymap_fim_prev (default: <C-J> / <C-K>) — cycle through completions

Changes

  • s:cache_insert / s:cache_get — ring buffer per key, dedup by content
  • s:fim_on_response — normalize server response (dict or array), insert each individually
  • s:fim_try_hint — simplified two-phase logic
  • s:fim_render — accept list of responses + selected index + fuzzy flag
  • llama#fim_cycle — cycle through completions
  • Info bar shows [N/M] completion index and total cached entries

AI usage disclosure: YES. llama.cpp + pi

@ggerganov ggerganov changed the title core : add n_cmpl support fim : add n_cmpl support May 12, 2026
…port

- cache now stores up to n_cmpl individual completions per key (ring buffer)
- simplify fim_try_hint: exact match with cycling, nearby match without
- add n_cmpl config option (default: 1)
- add keymap_fim_next/keymap_fim_prev for cycling (<C-J>/<C-K>)
- add llama#fim_cycle() for cycling through completions
- deduplicate completions on cache insert by content
- update info bar to show total cached entries and completion index

Assisted-by: llama.cpp:local pi
@ggerganov ggerganov changed the title fim : add n_cmpl support fim : redesign cache as ring buffer per key with multi-completion support May 12, 2026
@ggerganov ggerganov marked this pull request as ready for review May 12, 2026 19:11
ggerganov added 2 commits May 12, 2026 22:17
Map the cycle keymaps whenever a FIM hint is shown, not just when
multiple completions are available. llama#fim_cycle returns '' early
when there is nothing to cycle, preventing the default insert-mode
behavior of moving the cursor.

Assisted-by: llama.cpp:local pi
The fuzzy flag was only used to suppress the [N/M] completion index
in the info bar. Simply checking len(responses) > 1 is sufficient.

Assisted-by: llama.cpp:local pi
@ggerganov ggerganov changed the title fim : redesign cache as ring buffer per key with multi-completion support fim : support n_cmpl May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant