Skip to content

Commit eb95ed4

Browse files
committed
fix addurl concurrency issue
addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project
1 parent a3cdff3 commit eb95ed4

File tree

5 files changed

+42
-2
lines changed

5 files changed

+42
-2
lines changed

CHANGELOG

+2
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ git-annex (8.20211012) UNRELEASED; urgency=medium
2727
the hook to freeze the file in ways that prevent moving it, such as
2828
removing the Windows delete permission.
2929
Thanks, Reiko Asakura.
30+
* addurl: Support adding the same url to multiple files at the same
31+
time when using -J with --batch --with-files.
3032

3133
-- Joey Hess <[email protected]> Mon, 11 Oct 2021 14:09:13 -0400
3234

Command/AddUrl.hs

+11-1
Original file line numberDiff line numberDiff line change
@@ -372,11 +372,21 @@ checkRaw o a
372372
- a filename. It's not displayed then for output consistency,
373373
- but is added to the json when available. -}
374374
startingAddUrl :: SeekInput -> URLString -> AddUrlOptions -> CommandPerform -> CommandStart
375-
startingAddUrl si url o p = starting "addurl" (ActionItemOther (Just url)) si $ do
375+
startingAddUrl si url o p = starting "addurl" ai si $ do
376376
case fileOption (downloadOptions o) of
377377
Nothing -> noop
378378
Just file -> maybeShowJSON $ JSONChunk [("file", file)]
379379
p
380+
where
381+
-- Avoid failure when the same url is downloaded concurrently
382+
-- to two different files, by using OnlyActionOn with a key
383+
-- based on the url. Note that this may not be the actual key
384+
-- that is used for the download; later size information may be
385+
-- available and get added to it. That's ok, this is only
386+
-- used to prevent two threads running concurrently when that would
387+
-- likely fail.
388+
ai = OnlyActionOn urlkey (ActionItemOther (Just url))
389+
urlkey = Backend.URL.fromUrl url Nothing
380390

381391
showDestinationFile :: FilePath -> Annex ()
382392
showDestinationFile file = do

doc/bugs/addurl_failure_has_empty_error-messages.mdwn

+2
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@ While running `git-annex addurl --batch --with-files --jobs 10 --json --json-err
22

33
[[!meta author=jwodder]]
44
[[!tag projects/dandi]]
5+
6+
> [[fixed|done]] --[[Joey]]

doc/bugs/addurl_failure_has_empty_error-messages/comment_4_74e37e96d01bfb8b9521000d5faa7e53._comment

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ downloading, and the key transfer machinery prevents redundant downloads
88
of the same Key at the same time.
99

1010
Arguably, the problem is not where the message gets put, but that
11-
it fails when adding an url to two different paths.
11+
it fails when adding an url to two different paths at the same time.
1212

1313
I have, though, moved that message so it will appear in error-messages.
1414
"""]]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[[!comment format=mdwn
2+
username="joey"
3+
subject="""comment 5"""
4+
date="2021-10-27T18:56:23Z"
5+
content="""
6+
The best solution I can find is for it to notice when another thread is
7+
downloading the same url, and wait until it finishes. Then proceed
8+
with downloading the url for a second time.
9+
10+
It's not very satisfying to re-download. But once the url Key is downloaded,
11+
it does not keep that url Key populated, but hashes the content and moves
12+
the content to the final Key. It would be a real complication to
13+
communicate, across threads, what Key the content ended up at, and have the
14+
waiting thread use that. And addurl is already complicated well beyond a
15+
point I am comfortable with.
16+
17+
Also, the content of an url can of course change over time. If I feed
18+
"$url foo" into git-annex addurl --batch -J10 and then some time
19+
later, I feed "$url bar", I might expect that file bar gets whatever
20+
content the url has now, not the content that the url had back when I added
21+
the same url to file foo. And if I cared about avoiding re-downloading,
22+
I could add the url to the first file, and then copy the annex link to the
23+
second file myself.
24+
25+
Implemented this approach.
26+
"""]]

0 commit comments

Comments
 (0)