Skip to content

Commit 165078d

Browse files
derrickstoleeGit for Windows Build Agent
authored and
Git for Windows Build Agent
committed
Add path walk API and its use in 'git pack-objects' (#5171)
This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.
2 parents 04ae051 + 4fb7d0e commit 165078d

23 files changed

+532
-41
lines changed

Documentation/config/feature.adoc

+4
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ walking fewer objects.
2020
+
2121
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
2222
reusing objects from multiple packs instead of just one.
23+
+
24+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
25+
significantly smaller in the presence of certain filename collisions with Git's
26+
default name-hash.
2327
2428
feature.manyFiles::
2529
Enable config options that optimize for repos with many files in the

Documentation/config/pack.adoc

+8
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,14 @@ pack.useSparse::
155155
commits contain certain types of direct renames. Default is
156156
`true`.
157157

158+
pack.usePathWalk::
159+
When true, git will default to using the '--path-walk' option in
160+
'git pack-objects' when the '--revs' option is present. This
161+
algorithm groups objects by path to maximize the ability to
162+
compute delta chains across historical versions of the same
163+
object. This may disable other options, such as using bitmaps to
164+
enumerate objects.
165+
158166
pack.preferBitmapTips::
159167
When selecting which commits will receive bitmaps, prefer a
160168
commit at the tip of any reference that is a suffix of any value

Documentation/git-pack-objects.adoc

+11-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ SYNOPSIS
1616
[--cruft] [--cruft-expiration=<time>]
1717
[--stdout [--filter=<filter-spec>] | <base-name>]
1818
[--shallow] [--keep-true-parents] [--[no-]sparse]
19-
[--name-hash-version=<n>] < <object-list>
19+
[--name-hash-version=<n>] [--path-walk] < <object-list>
2020

2121

2222
DESCRIPTION
@@ -375,6 +375,16 @@ many different directories. At the moment, this version is not allowed
375375
when writing reachability bitmap files with `--write-bitmap-index` and it
376376
will be automatically changed to version `1`.
377377
378+
--path-walk::
379+
By default, `git pack-objects` walks objects in an order that
380+
presents trees and blobs in an order unrelated to the path they
381+
appear relative to a commit's root tree. The `--path-walk` option
382+
enables a different walking algorithm that organizes trees and
383+
blobs by path. This has the potential to improve delta compression
384+
especially in the presence of filenames that cause collisions in
385+
Git's default name-hash algorithm. Due to changing how the objects
386+
are walked, this option is not compatible with `--delta-islands`,
387+
`--shallow`, or `--filter`.
378388
379389
DELTA ISLANDS
380390
-------------

Documentation/git-repack.adoc

+13-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SYNOPSIS
1111
[verse]
1212
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
1313
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
14-
[--write-midx] [--name-hash-version=<n>]
14+
[--write-midx] [--name-hash-version=<n>] [--path-walk]
1515

1616
DESCRIPTION
1717
-----------
@@ -255,6 +255,18 @@ linkgit:git-multi-pack-index[1]).
255255
Provide this argument to the underlying `git pack-objects` process.
256256
See linkgit:git-pack-objects[1] for full details.
257257

258+
--path-walk::
259+
This option passes the `--path-walk` option to the underlying
260+
`git pack-options` process (see linkgit:git-pack-objects[1]).
261+
By default, `git pack-objects` walks objects in an order that
262+
presents trees and blobs in an order unrelated to the path they
263+
appear relative to a commit's root tree. The `--path-walk` option
264+
enables a different walking algorithm that organizes trees and
265+
blobs by path. This has the potential to improve delta compression
266+
especially in the presence of filenames that cause collisions in
267+
Git's default name-hash algorithm. Due to changing how the objects
268+
are walked, this option is not compatible with `--delta-islands`
269+
or `--filter`.
258270

259271
CONFIGURATION
260272
-------------

Documentation/technical/api-path-walk.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,4 @@ Examples
7070
See example usages in:
7171
`t/helper/test-path-walk.c`,
7272
`builtin/backfill.c`
73+
`builtin/pack-objects.c`

0 commit comments

Comments
 (0)