feat: cluster derivations by package#1035
Conversation
259a9bd to
97fa7a7
Compare
fc1e2ad to
d1bdffe
Compare
fricklerhandwerk
left a comment
There was a problem hiding this comment.
Only reviewed the tests, let's fix those first
9c491e9 to
88a8c4b
Compare
88a8c4b to
f124a05
Compare
f124a05 to
bf0a091
Compare
8d2b4c4 to
62262a3
Compare
|
Rebased and reordered the commits to have the test refactoring be a preparation rather than an afterthought. |
62262a3 to
81ba647
Compare
|
Rebased on top of #537, since we decided to just go ahead and do the package clustering in this PR already. Reason: We'd be moving >100M rows, which we'll have to go through again if we clustered in a follow-up. We're already wasting disk space, and may as well deduplicate immediately. |
81ba647 to
e7656b6
Compare
|
I've implemented package clustering as discussed with @adekoder:
There were few interesting edge cases to consider that weren't obvious so far, such as how to organize the catch-up so it doesn't clobber a fresh eval if it's still running. Closes #790 now. Marked as draft, since there are still somewhat unrelated preliminary changes baked in. I split them out into separate PRs, after which I'll rebase this one: |
b807f6b to
3f1c620
Compare
|
Reviewed with @adekoder, who correctly pointed out the remaining race condition between the post-eval listener and the backfill.
Left to do:
|
5ed300b to
660ce28
Compare
660ce28 to
a28031f
Compare
a28031f to
6812659
Compare
6812659 to
063ba5c
Compare
| PackageAttrpath.objects.bulk_create(new_attrpaths, ignore_conflicts=True) | ||
| PackageDerivation.objects.bulk_create(new_links, ignore_conflicts=True) |
There was a problem hiding this comment.
skip_locked=True only prevents concurrent access to the same derivation, but not Package, PackageAttrpath, or the overall clustering decision.
If backfill and post-eval trigger race at the batch boundary, we can end up with
- attrpath
foo.barregistered to package P1 - some derivations with
foo.barlinked to P1 - other derivations with
foo.barlinked to P2
This will produce cleanup work and, likely, extreme confusion. I prefer fixing now this since we don't have UI for cleanups, and it will take a while until we get to that UI. Working on a draft.
This PR is the Phase 1 of Nix Derivation data De-duplication.
The
homepageanddescriptionare duplicate across evaluation of the same packages, we are extracting them intoNixDerivationHomepageandNixDerivationDescriptiontables as a first step toward package-level deduplication.Notes: