Skip to content

feat: cluster derivations by package#1035

Open
adekoder wants to merge 3 commits into
NixOS:mainfrom
adekoder:migrate-nixderivation-homepage-description
Open

feat: cluster derivations by package#1035
adekoder wants to merge 3 commits into
NixOS:mainfrom
adekoder:migrate-nixderivation-homepage-description

Conversation

@adekoder
Copy link
Copy Markdown
Collaborator

@adekoder adekoder commented May 15, 2026

This PR is the Phase 1 of Nix Derivation data De-duplication.

The homepage and description are duplicate across evaluation of the same packages, we are extracting them into
NixDerivationHomepage and NixDerivationDescription tables as a first step toward package-level deduplication.

Notes:

1. Run manage.py migrate_homepage_description --dry-run — verify counts
2. Run manage.py migrate_homepage_description
3. Run manage.py migrate_homepage_description --null-after to clear the old columns

Comment thread src/shared/cache_suggestions.py Fixed
Comment thread src/shared/tests/test_migrate_homepage_description.py Fixed
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from 259a9bd to 97fa7a7 Compare May 15, 2026 14:50
Comment thread src/shared/cache_suggestions.py Fixed
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch 3 times, most recently from fc1e2ad to d1bdffe Compare May 15, 2026 15:20
@adekoder adekoder changed the title feat: migrate-nixderivation-homepage-description feat: migrate nixderivation homepage description May 15, 2026
@adekoder adekoder changed the title feat: migrate nixderivation homepage description feat: migrate nixderivation homepage and description May 15, 2026
Copy link
Copy Markdown
Collaborator

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the tests, let's fix those first

Comment thread src/shared/tests/test_migrate_homepage_description.py Outdated
Comment thread src/shared/tests/test_migrate_homepage_description.py Outdated
Comment thread src/shared/tests/test_migrate_homepage_description.py Outdated
Comment thread src/shared/tests/test_migrate_homepage_description.py Outdated
Comment thread src/shared/tests/test_migrate_homepage_description.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/tests/conftest.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/cache_suggestions.py Outdated
Comment thread src/shared/cache_suggestions.py Outdated
Comment thread src/shared/cache_suggestions.py Outdated
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from 9c491e9 to 88a8c4b Compare May 21, 2026 07:19
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
Comment thread src/shared/tests/test_suggestion_caching.py Outdated
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from 88a8c4b to f124a05 Compare May 21, 2026 14:32
Comment thread src/shared/tests/conftest.py Fixed
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from f124a05 to bf0a091 Compare May 21, 2026 14:41
@fricklerhandwerk fricklerhandwerk force-pushed the migrate-nixderivation-homepage-description branch 2 times, most recently from 8d2b4c4 to 62262a3 Compare May 22, 2026 12:17
@fricklerhandwerk fricklerhandwerk dismissed their stale review May 22, 2026 12:19

implemented

@fricklerhandwerk
Copy link
Copy Markdown
Collaborator

Rebased and reordered the commits to have the test refactoring be a preparation rather than an afterthought.

@fricklerhandwerk fricklerhandwerk changed the title feat: migrate nixderivation homepage and description refactor: migrate homepage and description to separate tables May 22, 2026
@fricklerhandwerk fricklerhandwerk force-pushed the migrate-nixderivation-homepage-description branch from 62262a3 to 81ba647 Compare May 22, 2026 14:46
@fricklerhandwerk
Copy link
Copy Markdown
Collaborator

Rebased on top of #537, since we decided to just go ahead and do the package clustering in this PR already. Reason: We'd be moving >100M rows, which we'll have to go through again if we clustered in a follow-up. We're already wasting disk space, and may as well deduplicate immediately.

@fricklerhandwerk fricklerhandwerk force-pushed the migrate-nixderivation-homepage-description branch from 81ba647 to e7656b6 Compare May 22, 2026 20:06
@fricklerhandwerk fricklerhandwerk changed the title refactor: migrate homepage and description to separate tables feat: cluster derivations by package May 22, 2026
@fricklerhandwerk fricklerhandwerk marked this pull request as draft May 22, 2026 20:06
@fricklerhandwerk
Copy link
Copy Markdown
Collaborator

fricklerhandwerk commented May 22, 2026

I've implemented package clustering as discussed with @adekoder:

  • on trigger after complete eval
  • as a command
  • tests
  • removed the previous merely-moving of data

There were few interesting edge cases to consider that weren't obvious so far, such as how to organize the catch-up so it doesn't clobber a fresh eval if it's still running.

Closes #790 now.

Marked as draft, since there are still somewhat unrelated preliminary changes baked in. I split them out into separate PRs, after which I'll rebase this one:

@fricklerhandwerk fricklerhandwerk force-pushed the migrate-nixderivation-homepage-description branch 2 times, most recently from b807f6b to 3f1c620 Compare May 22, 2026 20:25
Comment thread src/shared/package_clustering.py Outdated
Comment thread src/shared/package_clustering.py Outdated
Comment thread src/shared/package_clustering.py Outdated
Comment thread src/shared/listeners/package_clustering.py Outdated
Comment thread src/shared/management/commands/backfill_package_clustering.py Outdated
@fricklerhandwerk
Copy link
Copy Markdown
Collaborator

fricklerhandwerk commented May 26, 2026

Reviewed with @adekoder, who correctly pointed out the remaining race condition between the post-eval listener and the backfill.

  • Renamed the command to highlight that it's for backfilling

    It now starts with the services to recover what crashed listeners may have left over

  • Cut off the backfill query up to the latest complete evaluation.

    This ensures that listener and backfill command never touch the same rows

  • Wrapped each batch into a transaction to avoid partial updates (especially deleting the metadata from the source derivation!)

    @adekoder Probably we don't need row locking after all (it's still there after the messy fix), since the concurrent queries are now non-overlapping

Left to do:

  • Review tests
  • Test the concurrency scenario
  • Resolve merge conflicts
  • Massage history

@fricklerhandwerk fricklerhandwerk marked this pull request as ready for review June 1, 2026 07:52
Comment thread src/shared/management/commands/backfill_package_clustering.py Outdated
Comment thread src/shared/package_clustering.py
@fricklerhandwerk fricklerhandwerk marked this pull request as draft June 1, 2026 09:28
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch 2 times, most recently from 5ed300b to 660ce28 Compare June 1, 2026 16:12
Comment thread src/shared/cache_suggestions.py Outdated
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from 660ce28 to a28031f Compare June 2, 2026 07:32
@fricklerhandwerk fricklerhandwerk marked this pull request as ready for review June 2, 2026 07:41
Comment thread src/shared/tests/conftest.py Outdated
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from a28031f to 6812659 Compare June 3, 2026 08:12
@adekoder adekoder force-pushed the migrate-nixderivation-homepage-description branch from 6812659 to 063ba5c Compare June 5, 2026 07:32
Comment on lines +154 to +155
PackageAttrpath.objects.bulk_create(new_attrpaths, ignore_conflicts=True)
PackageDerivation.objects.bulk_create(new_links, ignore_conflicts=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_locked=True only prevents concurrent access to the same derivation, but not Package, PackageAttrpath, or the overall clustering decision.

If backfill and post-eval trigger race at the batch boundary, we can end up with

  • attrpath foo.bar registered to package P1
  • some derivations with foo.bar linked to P1
  • other derivations with foo.bar linked to P2

This will produce cleanup work and, likely, extreme confusion. I prefer fixing now this since we don't have UI for cleanups, and it will take a while until we get to that UI. Working on a draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants