Skip to content

fix(incremental): scoped --update <subfolder> silently prunes the rest of the graph#1144

Open
tduong628 wants to merge 1 commit into
safishamsi:v8from
tduong628:fix/scope-safe-incremental-prune
Open

fix(incremental): scoped --update <subfolder> silently prunes the rest of the graph#1144
tduong628 wants to merge 1 commit into
safishamsi:v8from
tduong628:fix/scope-safe-incremental-prune

Conversation

@tduong628

Copy link
Copy Markdown

The bug

Running graphify <subfolder> --update on a subfolder of a corpus that is rooted higher up silently deletes every node outside that subfolder.

detect_incremental() computes deletions as:

current_files = {f for flist in full["files"].values() for f in flist}
deleted_files = [f for f in manifest if f not in current_files]

current_files only contains paths under the scanned root, so every manifest entry outside root is reported as deleted. The skill's --update flow forwards that list straight to build_merge(prune_sources=...):

deleted = list(incremental.get('deleted_files', []))
G = build_merge([new_extraction], graph_path='graphify-out/graph.json', prune_sources=deleted or None)

…and build_merge's anti-shrink safety check (#479) is explicitly skipped whenever prune_sources is set, so there is no guardrail — the rest of the graph is wiped with no error.

Repro

A graph built from a corpus root containing memory/ and topics/. Later, graphify topics/ --update to fold in one new topic file. Result: all memory/ nodes are reported deleted and pruned. (Hit this in practice: a scoped update on a topics folder reported ~250 unrelated nodes as deleted.)

The fix

detect_incremental() now treats a manifest entry as deleted only when it is (a) inside the current scan root's subtree AND (b) genuinely absent from disk. Out-of-scope entries are simply not part of this incremental run, and entries still on disk but skipped by detect() (sensitive / unsupported / excluded) are not deletions either.

Two supporting changes:

  • realpath-based manifest matching so symlink / mount-alias / .. path-form drift no longer produces false changed or false deleted.
  • a loud, non-raising warning in build_merge when a prune would remove a large share of the graph — defense in depth, since the disabled anti-shrink guard is what made the original failure silent.

Tests

New tests/test_incremental_scope_safety.py (4 cases):

  • scoped subfolder update reports no out-of-scope deletions
  • genuine in-scope deletion is still detected
  • full-root update still prunes genuinely deleted files
  • a manifest file still present on disk is never reported deleted
tests/test_incremental_scope_safety.py ....                    [100%]
4 passed

No regressions: test_detect.py, test_incremental.py, test_build.py140 passed.

detect_incremental computed deletions as every manifest entry not found
under the scanned root. Running --update on a SUBFOLDER of a larger corpus
therefore reported the entire rest of the corpus as deleted; the --update
driver forwards that list to build_merge(prune_sources=...), whose
anti-shrink guard (safishamsi#479) is explicitly disabled while pruning, so the rest
of the graph is wiped silently.

A manifest entry now counts as deleted only when it is (a) within the
current scan root's subtree AND (b) genuinely absent from disk. Out-of-scope
entries, and entries still present on disk but skipped by detect()
(sensitive/unsupported/excluded), are left untouched.

Also adds realpath-based manifest matching so symlink / mount-alias path
forms no longer cause false 'changed' or false 'deleted', and a loud
(non-raising) warning in build_merge when a prune would remove a large
share of the graph - the disabled anti-shrink guard is what made the
original failure silent.

Adds tests/test_incremental_scope_safety.py (4 cases).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant