Skip to content

Edit-block path doubling: small models hallucinate prefix-doubled headers; existing prepended-dir guard misses multi-segment cases #5111

@ekrembasari

Description

@ekrembasari

Summary

Small editor models (e.g. groq/llama-3.3-70b-versatile, openrouter/openai/gpt-oss-120b, gemini-2.5-flash-lite) occasionally emit edit-block filename headers with the chat-file's own prefix duplicated — e.g. .claude/.claude/foo.json when the chat file is .claude/foo.json. The existing "GPT prepended a bogus dir" guard at wholefile_coder.py:68-72 only matches when the LLM emits just the basename, so multi-segment doubled prefixes fall through. abs_root_path() concatenates blindly, the file lands at <root>/.claude/.claude/foo.json, and the canonical path is left empty (or a 0-byte stub).

Reproduction (observed in production dispatch)

  • Edit format: whole (also reproduces under diff — see fuzzy-match cutoff analysis below)
  • Editor models seen triggering: groq/llama-3.3-70b-versatile, openrouter/openai/gpt-oss-120b, gemini-2.5-flash-lite
  • --file arg: absolute path resolving to e.g. .claude/claude-mem.config.json
  • Trigger condition: dense --read context with multiple files sharing a top directory prefix

Symptom: <root>/.claude/.claude/claude-mem.config.json written; <root>/.claude/claude-mem.config.json is a 0-byte stub or empty. Observed across ≥6 dispatches over a 4-day window in a project that runs aider via subprocess from a stable git root with absolute --file paths.

Root cause

aider/coders/wholefile_coder.py:68-72:

if fname and fname not in chat_files and Path(fname).name in chat_files:
    fname = Path(fname).name

Only handles the case where the LLM-emitted fname's basename alone matches a chat file. For chat file .claude/claude-mem.config.json, the bare basename claude-mem.config.json is not in chat_files (chat_files contains the full relative path), so the guard falls through silently.

editblock_coder.find_filename has the same gap — difflib.get_close_matches(..., cutoff=0.8) does catch many doubled-prefix cases (e.g. dir/dir/file.py ratio ≈ 0.857), but cases where the doubling produces a SequenceMatcher ratio below 0.8 (e.g. sub/dir/sub/dir/foo.py vs sub/dir/foo.py ratio = 0.778) fall through to the has-extension fallback and the doubled path is returned verbatim.

Proposed fix

Extend the existing prepended-dir guard with progressive suffix-stripping against the chat-files list. Minimal, follows the same pattern. Only triggers when exact match AND basename match both fail, and only resolves when a deterministic suffix is itself a known chat file.

PR with implementation + tests forthcoming (4 files changed, +67/-2 lines). Tests cover the wholefile path (LLM doubles subdir/sample.txtsubdir/subdir/sample.txt) and the editblock fuzzy-below-cutoff case.

Related (different mechanism, similar surface)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions