-
Notifications
You must be signed in to change notification settings - Fork 143
Integrate the sparse index with 'git apply' and interactive add, checkout, and reset #1914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1adf81e
0a27527
d1482a2
a50c57f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,7 @@ static const char * const apply_usage[] = { | |
int cmd_apply(int argc, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The sparse index allows storing directory entries in the index, marked
> with the skip-wortkree bit and pointing to a tree object. This may be an
> unexpected data shape for some implementation areas, so we are rolling
> it out incrementally on a builtin-per-builtin basis.
>
> This change enables the sparse index for 'git apply'. The main
> motivation for this change is that 'git apply' is used as a child
> process of 'git add -p' and expanding the sparse index for each of those
> child processes can lead to significant performance issues.
>
> The good news is that the actual index manipulation code used by 'git
> apply' is already integrated with the sparse index, so the only product
> change is to mark the builtin as allowing the sparse index so it isn't
> inflated on read.
>
> The more involved part of this change is around adding tests that verify
> how 'git apply' behaves in a sparse-checkout environment and whether or
> not the index expands in certain operations.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> builtin/apply.c | 7 +++-
> t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
> 2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 84f1863d3ac3..a1e20c593d09 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -12,7 +12,7 @@ static const char * const apply_usage[] = {
> int cmd_apply(int argc,
> const char **argv,
> const char *prefix,
> - struct repository *repo UNUSED)
> + struct repository *repo)
> {
> int force_apply = 0;
> int options = 0;
> @@ -35,6 +35,11 @@ int cmd_apply(int argc,
> &state, &force_apply, &options,
> apply_usage);
>
> + if (repo) {
> + prepare_repo_settings(repo);
> + repo->settings.command_requires_full_index = 0;
> + }
> +
> if (check_apply_state(&state, force_apply))
> exit(128);
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f9b448792cb4..ab8bd371eff3 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' '
> grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache
> '
>
> +test_expect_success 'git apply functionality' '
> + init_repos &&
> +
> + test_all_match git checkout base &&
> +
> + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> + git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> + # Apply a patch to a file inside the sparse definition
> + test_all_match git apply --index --stat ../patch-in-sparse &&
> + test_all_match git status --porcelain=v2 &&
> +
> + # Apply a patch to a file outside the sparse definition
> + test_sparse_match test_must_fail git apply ../patch-outside &&
> + grep "No such file or directory" sparse-checkout-err &&
I was slightly confused by this at first, because I was thinking of
the case where folder2/a exists in the working directory despite not
matching the sparsity patterns, but you were testing a case where the
working directory matched the sparsity patterns, so folder2/a doesn't
exist.
So, the check here looks good.
And, when folder2/a does exist, then we're in the case handled by
82386b44963f (Merge branch 'en/present-despite-skipped', 2022-03-09),
which forces the directory to not be considered sparse, and so it's
just like the ../patch-in-sparse case...meaning it's not really a
different case to test.
So, it all makes sense.
> +
> + # But it works with --index and --cached
> + test_all_match git apply --index --stat ../patch-outside &&
> + test_all_match git status --porcelain=v2 &&
> + test_all_match git reset --hard &&
> + test_all_match git apply --cached --stat ../patch-outside &&
> + test_all_match git status --porcelain=v2
> +'
> +
> # When working with a sparse index, some commands will need to expand the
> # index to operate properly. If those commands also write the index back
> # to disk, they need to convert the index to sparse before writing.
> @@ -2345,6 +2369,28 @@ test_expect_success 'sparse-index is not expanded: check-attr' '
> ensure_not_expanded check-attr -a --cached -- folder1/a
> '
>
> +test_expect_success 'sparse-index is not expanded: git apply' '
> + init_repos &&
> +
> + git -C sparse-index checkout base &&
> + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> + git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> + # Apply a patch to a file inside the sparse definition
> + ensure_not_expanded apply --index --stat ../patch-in-sparse &&
> +
> + # Apply a patch to a file outside the sparse definition
> + # Fails when caring about the worktree.
> + ensure_not_expanded ! apply ../patch-outside &&
> +
> + # Expands when using --index.
> + ensure_expanded apply --index ../patch-outside &&
> + git -C sparse-index reset --hard &&
All makes sense up to here.
> +
> + # Does not expand when using --cached.
> + ensure_not_expanded apply --cached ../patch-outside
Wait, what? That makes no sense.
After some digging, I see why the test passed, but it's very
misleading. Just before this command, if you ran the following
commands from the sparse-index directory, you'd see the following:
$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
$
Which matches what you were testing and shows why it passed for you.
But I'd argue the test is not correct and confusing for anyone that
reads it, because:
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
In other words, the index was *already* (partially) expanded by the
`git apply --index`, and the `git reset --hard` did not fix that
contrary to expectations. Continuing from here we see:
$ git reset --hard
HEAD is now at 703fd3e initial commit
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
$ git sparse-checkout reapply
$ git ls-files -s --sparse | grep folder2
040000 123706f6fc38949628eaf0483edbf97ba21123ae 0 folder2/
So, we need to do a `git sparse-checkout reapply` to make sure we were
actually in the expected fully sparse state. From here...
$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"}
So, indeed, `git apply --cached ../patch-outside` DOES expand the
index, as I expected. It has to when folder2/ is a directory in the
index, so that we can get a folder2/a entry that we can modify. And
that's just what we see:
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0 folder2/a
Can you add a `git sparse-checkout reapply` right after your `git
reset --hard`, and then switch the ensure_not_expanded to
ensure_expanded for the apply --cached call? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 5/9/25 11:18 PM, Elijah Newren wrote:
> On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
>> + # Expands when using --index.
>> + ensure_expanded apply --index ../patch-outside &&
>> + git -C sparse-index reset --hard &&
> > All makes sense up to here.
> >> +
>> + # Does not expand when using --cached.
>> + ensure_not_expanded apply --cached ../patch-outside
> > Wait, what? That makes no sense.
> > After some digging, I see why the test passed, but it's very
> misleading. Just before this command, if you ran the following
> commands from the sparse-index directory, you'd see the following:
> > $ rm testme
> $ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
> $ grep ensure_full_index testme
> $
> > Which matches what you were testing and shows why it passed for you.
> But I'd argue the test is not correct and confusing for anyone that
> reads it, because:
> > $ git ls-files -s --sparse | grep folder2
> 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
> 100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
> > In other words, the index was *already* (partially) expanded by the
> `git apply --index`, and the `git reset --hard` did not fix that
> contrary to expectations. Continuing from here we see:
> > $ git reset --hard
> HEAD is now at 703fd3e initial commit
> $ git ls-files -s --sparse | grep folder2
> 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
> 100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
> $ git sparse-checkout reapply
> $ git ls-files -s --sparse | grep folder2
> 040000 123706f6fc38949628eaf0483edbf97ba21123ae 0 folder2/
> > So, we need to do a `git sparse-checkout reapply` to make sure we were
> actually in the expected fully sparse state. From here...
> > $ rm testme
> $ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
> $ grep ensure_full_index testme
> {"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
> {"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"}
> {"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
> {"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"}
> > So, indeed, `git apply --cached ../patch-outside` DOES expand the
> index, as I expected. It has to when folder2/ is a directory in the
> index, so that we can get a folder2/a entry that we can modify. And
> that's just what we see:
> > $ git ls-files -s --sparse | grep folder2
> 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
> 100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0 folder2/a
> > > Can you add a `git sparse-checkout reapply` right after your `git
> reset --hard`, and then switch the ensure_not_expanded to
> ensure_expanded for the apply --cached call?
Thanks for digging in here. I'm going to augment the test to
include both the ensure_expanded and ensure_not_expanded so we
get the extra coverage for these two scenarios.
Thanks for the careful eye!
-Stolee
|
||
const char **argv, | ||
const char *prefix, | ||
struct repository *repo UNUSED) | ||
struct repository *repo) | ||
{ | ||
int force_apply = 0; | ||
int options = 0; | ||
|
@@ -35,6 +35,11 @@ int cmd_apply(int argc, | |
&state, &force_apply, &options, | ||
apply_usage); | ||
|
||
if (repo) { | ||
prepare_repo_settings(repo); | ||
repo->settings.command_requires_full_index = 0; | ||
} | ||
|
||
if (check_apply_state(&state, force_apply)) | ||
exit(128); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -420,6 +420,9 @@ int cmd_reset(int argc, | |
oidcpy(&oid, &tree->object.oid); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Fri, May 16, 2025 at 7:55 AM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> Similar to the previous change for 'git add -p', the reset builtin
> checked for integration with the sparse index after possibly redirecting
> its logic toward the interactive logic. This means that the builtin
> would expand the sparse index to a full one upon read.
>
> Move this check earlier within cmd_reset() to improve performance here.
>
> Add tests to guarantee that we are not universally expanding the index.
> Add behavior tests to check that we are doing the same operations as a
> full index.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> builtin/reset.c | 6 ++--
> t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++--
> 2 files changed, 43 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 73b4537a9a56..dc50ffc1ac59 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -420,6 +420,9 @@ int cmd_reset(int argc,
> oidcpy(&oid, &tree->object.oid);
> }
>
> + prepare_repo_settings(the_repository);
> + the_repository->settings.command_requires_full_index = 0;
> +
> if (patch_mode) {
> if (reset_type != NONE)
> die(_("options '%s' and '%s' cannot be used together"), "--patch", "--{hard,mixed,soft}");
> @@ -457,9 +460,6 @@ int cmd_reset(int argc,
> if (intent_to_add && reset_type != MIXED)
> die(_("the option '%s' requires '%s'"), "-N", "--mixed");
>
> - prepare_repo_settings(the_repository);
> - the_repository->settings.command_requires_full_index = 0;
> -
> if (repo_read_index(the_repository) < 0)
> die(_("index file corrupt"));
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c419d8b57e84..d8101139b40a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -384,7 +384,7 @@ test_expect_success 'add, commit, checkout' '
> test_all_match git checkout -
> '
>
> -test_expect_success 'git add -p' '
> +test_expect_success 'git add, checkout, and reset with -p' '
> init_repos &&
>
> write_script edit-contents <<-\EOF &&
> @@ -398,7 +398,7 @@ test_expect_success 'git add -p' '
> test_write_lines y n >in &&
> run_on_all git add -p <in &&
> test_all_match git status --porcelain=v2 &&
> - test_all_match git reset &&
> + test_all_match git reset -p <in &&
>
> test_write_lines u 1 "" q >in &&
> run_on_all git add -i <in &&
> @@ -413,6 +413,12 @@ test_expect_success 'git add -p' '
> test_sparse_match git reset &&
> test_write_lines u 2 3 "" q >in &&
> run_on_all git add -i <in &&
> + test_sparse_match git status --porcelain=v2 &&
> +
> + run_on_all git add --sparse folder1 &&
> + run_on_all git commit -m "take changes" &&
> + test_write_lines y n y >in &&
> + test_sparse_match git checkout HEAD~1 --patch <in &&
> test_sparse_match git status --porcelain=v2
> '
>
> @@ -2458,6 +2464,38 @@ test_expect_success 'sparse-index is not expanded: git add -p' '
> ensure_expanded add -i <in
> '
>
> +test_expect_success 'sparse-index is not expanded: checkout -p, reset -p' '
> + init_repos &&
> +
> + # Does not expand when edits are within sparse checkout.
> + echo "new content" >sparse-index/deep/a &&
> + echo "new content" >sparse-index/deep/deeper1/a &&
> + git -C sparse-index commit -a -m "inside-changes" &&
> +
> + test_write_lines y y >in &&
> + ensure_not_expanded checkout HEAD~1 --patch <in &&
> +
> + echo "new content" >sparse-index/deep/a &&
> + echo "new content" >sparse-index/deep/deeper1/a &&
> + git -C sparse-index add . &&
> + ensure_not_expanded reset --patch <in &&
> +
> + # -p does expand when edits are outside sparse checkout.
> + mkdir -p sparse-index/folder1 &&
> + echo "new content" >sparse-index/folder1/a &&
> + git -C sparse-index add --sparse folder1 &&
> + git -C sparse-index sparse-checkout reapply &&
> + ensure_expanded reset --patch <in &&
> +
> + # Fully reset the index.
> + mkdir -p sparse-index/folder1 &&
> + echo "new content" >sparse-index/folder1/a &&
> + git -C sparse-index add --sparse folder1 &&
> + git -C sparse-index commit -m "folder1 change" &&
> + git -C sparse-index sparse-checkout reapply &&
> + ensure_expanded checkout HEAD~1 --patch <in
> +'
> +
> test_expect_success 'advice.sparseIndexExpanded' '
> init_repos &&
>
> --
> gitgitgadget
Patch looks good to me. |
||
} | ||
|
||
prepare_repo_settings(the_repository); | ||
the_repository->settings.command_requires_full_index = 0; | ||
|
||
if (patch_mode) { | ||
if (reset_type != NONE) | ||
die(_("options '%s' and '%s' cannot be used together"), "--patch", "--{hard,mixed,soft}"); | ||
|
@@ -457,9 +460,6 @@ int cmd_reset(int argc, | |
if (intent_to_add && reset_type != MIXED) | ||
die(_("the option '%s' requires '%s'"), "-N", "--mixed"); | ||
|
||
prepare_repo_settings(the_repository); | ||
the_repository->settings.command_requires_full_index = 0; | ||
|
||
if (repo_read_index(the_repository) < 0) | ||
die(_("index file corrupt")); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -135,5 +135,8 @@ test_perf_on_all git diff-tree HEAD | |
test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The previous two changes contributed performance improvements to 'git
> apply' and 'git add -p' when using a sparse index. Add a performance
> test to demonstrate this (and to help validate that performance remains
> good in the future).
>
> In the truncated test output below, we see that the full checkout
> performance changes within noise expectations, but the sparse index
> cases improve 33% and then 96%.
>
> HEAD~3 HEAD~2 HEAD~1
> ---------------------------------------------------------
> 2000.118: (full-v3) 0.80 0.84 +5.0% 0.84 +5.0%
> 2000.119: (full-v4) 0.76 0.79 +3.9% 0.80 +5.3%
> 2000.120: (sparse-v3) 2.09 1.39 -33.5% 0.07 -96.7%
> 2000.121: (sparse-v4) 2.09 1.39 -33.5% 0.07 -96.7%
>
> It is worth noting that if our test was more involved and had multiple
> hunks to evaluate, then the time spent in 'git apply' would dominate due
> to multiple index loads and writes. As it stands, we need the sparse
> index improvement in 'git add -p' itself to confirm this performance
> improvement.
>
> Since the change for 'git add -i' is identical, we avoid a second test
> case for that similar operation.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> t/perf/p2000-sparse-operations.sh | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 39e92b084143..3da6ae59c000 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -135,5 +135,6 @@ test_perf_on_all git diff-tree HEAD
> test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
> test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
> test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
> +test_perf_on_all 'echo >>a && test_write_lines y | git add -p'
>
> test_done
> --
> gitgitgadget
Very nice! |
||
test_perf_on_all "git worktree add ../temp && git worktree remove ../temp" | ||
test_perf_on_all git check-attr -a -- $SPARSE_CONE/a | ||
test_perf_on_all 'echo >>a && test_write_lines y | git add -p' | ||
test_perf_on_all 'test_write_lines y y y | git checkout --patch -' | ||
test_perf_on_all 'echo >>a && git add a && test_write_lines y | git reset --patch' | ||
|
||
test_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):