Skip to content

Commit 3c973ab

Browse files
committed
oops, add the new todos meant to be in prev commit
1 parent 87871f7 commit 3c973ab

5 files changed

+94
-0
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
The assistant is using NoLiveUpdate, but it should be posssible to plumb
2+
a LiveUpdate through it from preferred content checking to location log
3+
updating.
4+
5+
The benefit would be when using balanced preferred content expressions,
6+
the assistant would get live updates about repo sizes.
7+
8+
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

doc/todo/faster_proxying.mdwn

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Not that proxying is super slow, but it does involve bouncing content
2+
through the proxy, and could be made faster. Some ideas:
3+
4+
* A proxy to a local git repository spawns git-annex-shell
5+
to communicate with it. It would be more efficient to operate
6+
directly on the Remote. Especially when transferring content to/from it.
7+
But: When a cluster has several nodes that are local git repositories,
8+
and is sending data to all of them, this would need an alternate
9+
interface than `storeKey`, which supports streaming, of chunks
10+
of a ByteString.
11+
12+
* Use `sendfile()` to avoid data copying overhead when
13+
`receiveBytes` is being fed right into `sendBytes`.
14+
Library to use:
15+
<https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
16+
17+
* Getting a key from a cluster currently picks from amoung
18+
the lowest cost nodes at random. This could be smarter,
19+
eg prefer to avoid using nodes that are doing other transfers at the
20+
same time.
21+
22+
* The cost of a proxied node that is accessed via an intermediate gateway
23+
is currently the same as a node accessed via the cluster gateway. So in
24+
such a situation, git-annex may make a suboptimal choice of path.
25+
To fix this, there needs to be some way to tell how many hops through
26+
gateways it takes to get to a node. Currently the only way is to
27+
guess based on number of dashes in the node name, which is not satisfying.
28+
29+
Even counting hops is not very satisfying, one cluster gateway could
30+
be much more expensive to traverse than another one.
31+
32+
If seriously tackling this, it might be worth making enough information
33+
available to use spanning tree protocol for routing inside clusters.
34+
35+
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Should be possible to use a git-remote-annex annex::$uuid url as
2+
remote.foo.url with remote.foo.annexUrl using annex+http, and so
3+
not need a separate web server to serve the git repository when using
4+
`git-annex p2phttp`.
5+
6+
Doesn't work currently because git-remote-annex urls only support
7+
special remotes.
8+
9+
It would need a new form of git-remote-annex url, eg:
10+
annex::$uuid?annex+http://example.com/git-annex/
11+
12+
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
git-annex can proxy for remotes that are accessed locally or over
2+
ssh, as well as special remotes. But, it cannot proxy for remotes that
3+
themselves have a annex+http annexUrl.
4+
5+
This would need a translation from P2P protocol to servant client.
6+
Should not be very hard to implement if someone needs it for some reason.
7+
8+
Also, git-annex could support proxying to remotes whose url is a P2P
9+
address. Eg, tor-annex remotes. This only needs a way to
10+
generate a RemoteSide for them.
11+
12+
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]
13+
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
When proxying for a special remote, downloads can stream in from it and out
2+
the proxy, but that does happen via a temporary file, which grows to the
3+
full size of the file being downloaded. And uploads to a special get buffered to a
4+
temporary file.
5+
6+
It would be nice to do full streaming without temp files, but also it's a
7+
hard change to make.
8+
9+
Some improvements that could be made without making such a big change:
10+
11+
* When an upload to a cluster is distributed to multiple special remotes,
12+
a temporary file is written for each one, which may even happen in
13+
parallel. This is a lot of extra work and may use excess disk space.
14+
It should be possible to only write a single temp file.
15+
16+
* Check annex.diskreserve when proxying for special remotes
17+
to avoid the proxy's disk filling up with the temporary object file
18+
cached there.
19+
20+
* Resuming an interrupted download from proxied special remote makes the proxy
21+
re-download the whole content. It could instead keep some of the
22+
object files around when the client does not send SUCCESS. This would
23+
use more disk, but could minimize to eg, the last 2 or so.
24+
The [[design/passthrough_proxy]] design doc has some more thoughts about this.
25+
26+
(This is a deferred item from the [[todo/git-annex_proxies]] megatodo.) --[[Joey]]

0 commit comments

Comments
 (0)