Skip to content

feat: keep test noise out of community names and attach tests to their subjects#600

Open
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:feat/community-naming
Open

feat: keep test noise out of community names and attach tests to their subjects#600
SHudici wants to merge 1 commit into
tirth8205:mainfrom
SHudici:feat/community-naming

Conversation

@SHudici

@SHudici SHudici commented Jul 3, 2026

Copy link
Copy Markdown

Problem

On test-heavy repositories, community output is dominated by test artifacts in two ways:

  1. Clustering: shared fixtures and helpers give test files dense internal CALLS edges, while TESTED_BY links to subjects are sparse and weakly weighted (0.4). Leiden therefore clusters tests with each other instead of with the code they cover, producing test-only blobs that are useless for review context.
  2. Naming: BDD-style test names (test_should_return_x_when_y) flood the keyword counter with grammar words, and tests/ directories win the file-prefix vote — mixed communities come out named like tests-should.

Fix

  • _reassign_test_nodes (new post-Leiden pass): each Test node moves to the community holding the majority of its TESTED_BY partners; its own community counts as a vote, so tests already placed with their subjects stay put. Ties also prefer the current cluster and otherwise resolve to the lowest cluster index, so the outcome never depends on edge order. The test endpoint of an edge is identified by node kind, not edge direction, so the pass is independent of which direction the parser emits (and composes with the TESTED_BY direction fix, but does not require it). TESTED_BY endpoints that inherited a bare name from an unresolved cross-file call are resolved by node name when the name is unambiguous, so their tests still vote.
  • Naming: mixed communities are named from their production members only, and BDD grammar words (should, when, given, returns, ...) joined _COMMON_WORDS.

Deliberate scoping: the file-based fallback keeps grouping strictly by directory — its contract is "group by file", and moving tests out of their file group would contradict it. Behavior notes: tests with no TESTED_BY partner keep their original cluster; a test-only cluster that loses most members to reassignment can fall below min_size and drop out of the community list.

Testing

  • Unit tests for _reassign_test_nodes (move, direction-agnostic, majority vote, own-cluster votes, no-op without TESTED_BY, test-test edges ignored, bare-name endpoints resolved when unique / ignored when ambiguous, tie-breaks: stay home on a tie, deterministic lowest-cluster otherwise) — pure-function tests that run without igraph.
  • Naming tests: mixed community named from the production side, pure-test community still named, BDD words never become keywords.
  • End-to-end Leiden test (skipped when igraph is absent): two dense clusters linked only by TESTED_BY end up with every test co-located with its subject.
  • Full suite: 1412 passed / 0 failed (igraph 1.0.0 and without igraph).

🤖 Generated with Claude Code

…r subjects

Two related fixes for test-heavy repositories, where community output
was dominated by test artifacts:

- Leiden tends to cluster tests with each other rather than with the
  code they cover: shared fixtures and helpers give test files dense
  internal CALLS edges, while TESTED_BY links to subjects are sparse
  and weakly weighted. A new reassignment pass moves each Test node
  into the community holding the majority of its TESTED_BY partners
  (its own community counts as a vote, so tests already placed with
  their subjects stay put; ties also prefer the current cluster and
  otherwise resolve to the lowest cluster index, so the outcome never
  depends on edge order). The test endpoint of a TESTED_BY edge is
  identified by node kind rather than edge direction, so the pass is
  independent of the direction the parser emits. TESTED_BY endpoints
  inherited from unresolved cross-file calls can be bare names rather
  than qualified ones; those are resolved by node name when the name
  is unambiguous, so their tests still vote.

- Community naming is no longer hijacked by test members. Mixed
  communities are named from their production members only, and BDD
  test-name grammar ("should", "when", "given", ...) joined the
  stop-word list, so a mixed community that used to come out as
  "tests-should" is now named from its production side.

Behavior notes: tests with no TESTED_BY partner keep their original
cluster, and a test-only cluster that loses most members to
reassignment can fall below min_size and drop out of the community
list. The file-based fallback keeps grouping strictly by directory,
since moving tests out of their file group would contradict its
contract.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant