Skip to content

Add technical documentation #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// File formatting
"[markdown]": {
"editor.detectIndentation": false,
"editor.insertSpaces": false,
"editor.insertSpaces": true,
"editor.tabSize": 4,
"editor.wordWrap": "off",
"files.trimTrailingWhitespace": true,
Expand Down
249 changes: 249 additions & 0 deletions docs/technical/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# Bundle Server Architecture

This document contains information about the architecture of the bundle server
and how it is used with Git's [bundle-uri feature][bundle-uris].

[bundle-uris]: https://git-scm.com/docs/bundle-uri

## High-level design

The following diagram shows the relationship between the Git remote, Git client,
and the bundle server with respect to typical end-user usage.

```mermaid
graph TB
remotes[("Remote host(s)\n(GitHub, Bitbucket, GitLab, Codeberg, etc.)")]
subgraph server["Bundle Server"]
direction LR

repos[[Repository storage]]
bundles[[Bundle storage]]
routes[Route list]
web([git-bundle-web-server])

repos -. "git-bundle-server update" .-> bundles
routes --> web
bundles --> web
end
git(["git (clone|fetch)"])

style server fill:#4682b477

remotes --> repos
web --> git
remotes --> git
```

### Components

#### Remote host(s)

The Git hosts corresponding to the repositories served by the bundle server. The
bundle server can contain repositories from different remotes (e.g. one from
GitHub, another from GitLab), but each repository will have only one upstream
remote.

#### Repository storage

A collection of Git bare repositories cloned from the corresponding remote(s),
each representing a configured route on the bundle server. Repositories are
cloned into local storage at the path `~/git-bundle-server/git/<route>` (e.g.
`~/git-bundle/server/git/torvalds/linux` for the route `torvalds/linux`).

These repositories are kept up-to-date with their corresponding remote using
`git-bundle-server update`, either run manually or via the system scheduler
automatically started with `git-bundle-server (init|start)`. The repos are the
source of the bundles generated for the "Bundle storage" of each route.

#### Bundle storage

The base and incremental bundles for each active repository on the bundle
server. Bundles are created from the bundle server's cloned bare repositories
(see "Repository storage") and are stored on disk at the path
`~/git-bundle-server/www/<route>`, alongside a "bundle list" listing each bundle
and associated metadata. These files are served to the user via the
`git-bundle-web-server` API.

#### Route list

The list of _active_ routes in the bundle server (i.e., those for which bundles
are being generated and can be served via the web server).

#### `git-bundle-web-server`

The `git-bundle-web-server` executable built from this repository. It can be run
in the foreground directly, or started in the background with `git-bundle-server
web-server start`.

#### `git (clone|fetch)`

The Git client invoked by users, CI, IDEs, etc. Only the `clone` and `fetch`
commands use a bundle URI.

To bootstrap a repository from a given bundle URI, clone with `git clone
--bundle-uri=<uri>`. This will download all bundles from the bundle server
before fetching the remaining reachable objects from the origin remote.

When using this bundle server, `git clone --bundle-uri` will set the
`fetch.bundleURI` configuration key in the repository. Using this configuration,
future `git fetch` calls will also [check the bundle server for new
bundles][bundle-uri-fetch] according to the `creationToken` heuristic before
fetching from the origin remote.

[bundle-uri-fetch]: https://git-scm.com/docs/bundle-uri#_fetching_with_bundle_uris

## Use with `git`

Although the contents of the bundle server can be downloaded manually, the
intended use case of the bundle server is to supplement clones & fetches in Git.

In the following diagrams, we will be assuming use of characteristics matching
_this_ bundle server implementation, namely the `creationToken` heuristic.
Behavior in Git may differ if using a different server implementation.

### Downloading and unpacking a bundle list

The recommended use of this bundle server is as a source for a "bundle list": an
ordered list of base and incremental bundles that, in order, can be downloaded
and unbundled to populate the requested commits in a fetch or clone. At the core
of the bundle URI code in both `git clone` and `git fetch` is a common process
for downloading and unpacking the contents of a bundle list. The process is as
follows:

```mermaid
%%{ init: { 'flowchart': { 'curve': 'monotoneX' } } }%%
flowchart TB;
start{"Start"}
subgraph downloadAndUnbundle["Download and unbundle from list"]
direction TB

parse["Parse bundle list"]
sort["Sort bundles by creationToken,\nhigh to low, select bundle with\nhighest creationToken"]
creationToken{{"Current creationToken >= <code>minCreationToken</code>?"}}
reqBundle["Request bundle from server"]
downloadSuccess{{"Download successful?"}}
markUnbundled["Mark unbundled"]
markUnbundledSkip["Mark unbundled to\navoid retrying later"]
deeperExists{{"Are there more not-yet-unbundled bundles\nwith creationToken <i>less than</i> current?"}}
moveDeeper["Select next bundle with\ncreationToken less than current"]
unbundleReq["Unbundle downloaded bundle"]
shallowerExists{{"Are there more not-yet-unbundled bundles\nwith creationToken <i>greater than</i> current?"}}
moveShallower["Select next bundle with\ncreationToken greater than current"]
unbundleSuccess{{"Successfully unbundled? (not\nmissing any required commits)"}}
end
bundleServer[(Bundle Server)]
done{"Done"}

style downloadAndUnbundle fill:#28865477

start --> parse --> sort --> creationToken
creationToken ----------> |No| done
creationToken --> |Yes| reqBundle
reqBundle --> downloadSuccess
downloadSuccess --> |No| markUnbundledSkip
markUnbundledSkip --> deeperExists
deeperExists --> |No| done
deeperExists --> |Yes| moveDeeper --> creationToken
reqBundle <--> bundleServer

downloadSuccess --> |Yes| unbundleReq
unbundleReq --> unbundleSuccess
unbundleSuccess ----> |No| deeperExists
shallowerExists --> |No| done
unbundleSuccess --> |Yes| markUnbundled --> shallowerExists
shallowerExists --> |Yes| moveShallower --> unbundleReq

```

Note that this flow requires a `minCreationToken`: a creationToken value used to
avoid redundant downloads of old bundles. This value depends on whether the
algorithm is called from `git clone` or `git fetch`. Details on how this value
is determined can be found in later sections.

### `git clone`

When performing an initial clone from a remote repository, the `--bundle-uri`
option can point to a bundle list (recommended with this server) or to a single
base bundle. In the case of a bundle list, the bundle URI will be stored along
with a `minCreationToken` value in the repository config for subsequent fetches.

```mermaid
%%{ init: { 'flowchart': { 'curve': 'monotoneX' } } }%%
flowchart TB;
user((User))
subgraph git
direction TB

setBundleUri["Set <code>bundleUri</code> to the value of --bundle-uri"]
downloadUri["Download from <code>bundleUri</code>"]
downloadType{{"What is downloaded?"}}
unbundle["Unbundle response"]
setCreationToken["Set <code>minCreationToken</code> to 0"]
downloadAndUnbundle(["Download and unbundle from list"])
bundleSuccess{{"Bundles downloaded and unpacked successfully?"}}
saveUri["Set fetch.bundleUri to <code>bundleUri</code>"]
saveCreationToken["Set fetch.bundleCreationToken to highest\nunbundled creationToken"]
incrementalFetch["Incremental fetch from origin"]

style downloadAndUnbundle fill:#288654,color:#000000
end
bundleServer[(Bundle Server)]
origin[(Remote host)]

user --> |"git clone --bundle-uri URI"| setBundleUri
downloadUri <--> bundleServer
setBundleUri --> downloadUri --> downloadType
downloadType --> |Single bundle| unbundle
unbundle --> incrementalFetch
downloadType --> |Other| incrementalFetch
downloadType --> |Bundle list| setCreationToken
setCreationToken --> downloadAndUnbundle --> bundleSuccess
bundleSuccess --> |Yes| saveUri
downloadAndUnbundle <---> bundleServer
bundleSuccess --> |No| incrementalFetch
saveUri --> saveCreationToken --> incrementalFetch
incrementalFetch <--> origin
```

### `git fetch`

After successfully cloning with a bundle list URI (recommended) or manually
setting `fetch.bundleUri`, `git fetch` will try to download and unpack recent
bundles containing new commits.

```mermaid
%%{ init: { 'flowchart': { 'curve': 'monotoneX' } } }%%
flowchart TB;
user((User))
subgraph git
direction TB

bundleUriExists{{"fetch.bundleUri config is set?"}}
setBundleUri["Set <code>bundleUri</code> to the value of fetch.bundleUri"]
creationTokenExists{{"fetch.bundleCreationToken config is set?"}}
setCreationToken["Set <code>minCreationToken</code> to the value\nof fetch.bundleCreationToken"]
setCreationTokenZero["Set <code>creationToken</code> to 0"]
downloadAndUnbundle(["Download and unbundle from list"])
bundleSuccess{{"Bundles downloaded and unpacked successfully?"}}
saveCreationToken["Set fetch.bundleCreationToken to highest\nunbundled creationToken"]
incrementalFetch["Incremental fetch from origin"]

style downloadAndUnbundle fill:#288654,color:#000000
end
bundleServer[(Bundle Server)]
origin[(Remote host)]

user --> |"git fetch"| bundleUriExists
bundleUriExists --> |Yes| setBundleUri
bundleUriExists --> |No| incrementalFetch
setBundleUri --> creationTokenExists
creationTokenExists --> |Yes| setCreationToken
creationTokenExists --> |No| setCreationTokenZero
setCreationToken & setCreationTokenZero --> downloadAndUnbundle
downloadAndUnbundle <--> bundleServer
downloadAndUnbundle --> bundleSuccess
bundleSuccess --> |Yes| saveCreationToken
bundleSuccess --> |No| incrementalFetch
saveCreationToken --> incrementalFetch
incrementalFetch <--> origin
```
112 changes: 112 additions & 0 deletions docs/technical/web-server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Web Server API Reference

This document contains an API specification for the web server created with
`git-bundle-web-server`. It is primarily meant to be used by Git via the
[bundle-uri feature][bundle-uris].

> **Warning**
>
> First and foremost, the goal of this API is compatibility with Git's bundle
> URI feature. We will attempt to keep it up-to-date with the latest version of
> Git but, due to both the newness of the feature and experimental state of the
> server, we cannot make guarantees of backward compatibility.

[bundle-uris]: https://git-scm.com/docs/bundle-uri

## Get a repository's bundle list

Get the list of bundles configured for a given bundle server route.

<table>
<tbody>
<tr>
<th>Method</th>
<td><code>GET</code></td>
</tr>
<tr>
<th>Route</th>
<td><code>/{route}</code></td>
</tr>
<tr>
<th>Example Request</th>
<td><code>curl http://localhost:8080/OWNER/REPO</code></td>
</tr>
<tr>
<th>Example Response</th>
<td>

```
[bundle]
version = 1
mode = all
heuristic = creationToken

[bundle "1678494078"]
uri = REPO/base-1678494078.bundle
creationToken = 1678494078

[bundle "1679527263"]
uri = REPO/bundle-1679527263.bundle
creationToken = 1679527263

[bundle "1680561322"]
uri = REPO/bundle-1680561322.bundle
creationToken = 1680561322
```

</td>
</tr>
</tbody>
</table>

### Path parameters

| Name | Type | Required | Description |
| ------- | ------ | --------- | ----------- |
| `route` | string | Yes | The route of a repository created with `git-bundle-server init` for which the list of active bundles is requested. Route should be in `OWNER/REPO` format. |

### HTTP response status codes

| Code | Description |
| ----- | ----------- |
| `200` | OK |
| `404` | Specified route does not exist or has no bundles configured |

## Download a bundle

Download an individual bundle.

<table>
<tbody>
<tr>
<th>Method</th>
<td><code>GET</code></td>
</tr>
<tr>
<th>Route</th>
<td><code>/{route}</code></td>
</tr>
<tr>
<th>Example Request</th>
<td><code>curl http://localhost:8080/OWNER/REPO/bundle-1679527263.bundle</code></td>
</tr>
<tr>
<th>Example Response</th>
<td><i>Binary </i><a href="https://git-scm.com/docs/git-bundle"><code>git bundle</code></a><i> bundle content.</i></td>
</tr>
</tbody>
</table>

### Path parameters

| Name | Type | Required | Description |
| -------- | ------ | --------- | ----------- |
| `route` | string | Yes | The route of a repository containing the desired bundle. Route should be in `OWNER/REPO` format. |
| `bundle` | string | Yes | The filename of the desired bundle as identified by the `route`'s bundle list. |

### HTTP response status codes

| Code | Description |
| ----- | ----------- |
| `200` | OK |
| `404` | The specified bundle does not exist |