-
Notifications
You must be signed in to change notification settings - Fork 278
PoC: Let user download targets #1171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Good job noticing that _get_file
doesn't need the download_safely
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @sechkova, especially the removal of unused code!
I would definitely like to see some tests for this custom_download_handler. Both testing a custom handler works and also ensuring we error sensibly if an incorrect handler is passed.
I've commented inline regarding my desire to encode the expected API of the custom handler and do some sanity checks that the API is adhered to, let me know what you think about those ideas.
Remove the boolean switch to toggle safe or unsafe download in Updater._get_file since the method is private and is always called with the download_safely option enabled. Signed-off-by: Teodora Sechkova <[email protected]>
Give client-side users the option to provide their own custom download function which should comply with the specification. By default tuf.download.safe_download is used. Signed-off-by: Teodora Sechkova <[email protected]>
Add to the download_target docstring the exceptions that custom_download_handler implementation must raise. Signed-off-by: Teodora Sechkova <[email protected]>
Add a test case to test_updater which tests incorrect custom_download_handler implementations. Signed-off-by: Teodora Sechkova <[email protected]>
I pushed two new commits that only add tests for the new About encoding an API, given the dynamic nature of Python, I see two possible approaches:
I actually prefer none of the above 🤷 I don't see a big added value in 1) in comparison with the current behavior. Currently a I agree that option 2) would be the neater one. But given the current code and the plans to abandon it, is it worth formally defining an interface which the client can brake in practice and we can actually only "recommend it by design"? If I am missing a clever python way to achieve this, please, I will be glad to learn about it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks very reasonable. I would like to see an actual custom handler in the test -- it might show any undocumented requirements (like the fact that the handler probably really should return a TemporaryFile() -- other things probably don't make sense)
tests/test_updater.py
Outdated
|
||
# Test: normal case. | ||
self.repository_updater.download_target(targetinfo, destination_directory, | ||
custom_download_handler=tuf.download.safe_download) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a lot of work to actually make a working custom handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried! Here it is: ca6f280
What I did is a very rudimentary implementation of the logic in tuf.download
(skipping all extra checks, logging, etc).
I am hoping that this will help demonstrating what tuf
does without the need to follow all the details.
Not sure if this was your expectation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is good. I don't think it needs to demonstrate what tuf does: it needs to demonstrate how a custom handler could be written -- and it does that. I have opinions about including SlowRetrievalError but since it relates to the larger issue of what we would require from the handler, I'll write that in a separate comment.
tuf/client/updater.py
Outdated
Raise 'securesystemslib.exceptions.DownloadLengthMismatchError', if | ||
STRICT_REQUIRED_LENGTH is True and total_downloaded is not equal | ||
required_length. | ||
|
||
Raise 'tuf.exceptions.SlowRetrievalError', if the total downloaded was | ||
done in less than the acceptable download speed (as set in | ||
tuf.settings.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last paragraph does not make sense IMO: if the whole file has already been downloaded, what's the point of raising?
Anyway, I think slow retrieval and even the above length mismatch error should be optional: the implementation may raise these things in these conditions. Requiring things that even TUF does not successfully do is a bit much I think :)
on the length check: TUF should check the length after the handler in any case (because it can) whether the handlers raising is documented as optional or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we do check the length after the handler is finished (in verify_target_file()
) but please double check :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said below, this is copied from the current documentation and I am open for discussion on the exact formulation.
Related to the length verification, I checked ... 🤷
def _hard_check_file_length(self, file_object, trusted_file_length):
"""
<Purpose>
Non-public method that ensures the length of 'file_object' is strictly
equal to 'trusted_file_length'. This is a deliberately redundant
implementation designed to complement
tuf.download._check_downloaded_length().
tuf/client/updater.py
Outdated
'Given the 'url' and 'required_length' of the desired file, open a | ||
connection to 'url', download it, and return the contents of the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "contents of the file" means a file object, that should be explicit. It might even be useful to mention that it really should be a TemporaryFile() -- nothing else really makes sense.
Maybe makes sense to document the handler in the same way TUF functions are documented (even if the format is a bit weird)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the docstring as you suggest (aa3a635 ). The text is copied from the current tuf.download
module documentation and I don't insist on this content. It is what currently TUF claims doing.
Add a detailed documentation of custom_download_handler. Signed-off-by: Teodora Sechkova <[email protected]>
Add a test implementation of a custom_download_handler mimicking the tuf.download call stack. Signed-off-by: Teodora Sechkova <[email protected]>
tuf/client/updater.py
Outdated
tuf.settings.py). | ||
|
||
<Returns> | ||
A temporay file object that points to the contents of 'url'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A temporay file object that points to the contents of 'url'. | |
A temporary file object that points to the contents of 'url'. |
I'l let a maintainer make a call but I'll document my opinion on SlowRetrieval with regards to this PR:
*) I've not tried to implement this but here are the numbers that lead me to believe the code is not useful:
The client notices that the server is doing a slow retrieval attack two weeks after the attack started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jku's assertion that we should not require a custom handler to implement slow retrieval protection. It's not currently part of the spec and we most likely won't re-add it until we can specify how we expect implementations to, well, implement it.
Let's go ahead and remove the expectation for the custom handler to throw DownloadLengthMismatchError
(as it's handled elsewhere) and SlowRetrievalError
(as above) and simplify the tests by removing those too.
tuf/client/updater.py
Outdated
|
||
else: | ||
file_object = tuf.download.unsafe_download(file_mirror, file_length) | ||
# Eensure the length of the downloaded file matches 'file_length' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Eensure the length of the downloaded file matches 'file_length' | |
# Ensure the length of the downloaded file matches 'file_length' |
tests/test_updater.py
Outdated
else: | ||
return temp_file | ||
|
||
#Test passing a custom download function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Test passing a custom download function | |
# Test passing a custom download function |
tuf/client/updater.py
Outdated
DownloadLengthMismatchError, if there was a | ||
mismatch of observed vs expected lengths while downloading the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DownloadLengthMismatchError
is thrown after the download handler, or safe_download()
, call in _get_file()
by _hard_check_file_length()
(we also have an unused, except by tests, _soft_check_file_length()
🤦 ).
I don't think we need handlers to throw this. Better that we check lengths in the framework and leave the check as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the "deliberately redundant implementation designed to complement tuf.download._check_downloaded_length()"
#1171 (comment)
Seems like it can nicely complement custom_download_handler
too without the redundancy part!
Pushed two new commits simplifying the expectation for the custom handler and the tests. I did not amend the old commits to make the discussion easier to follow but I will probably want to squash/reorder some commits before a final merge. Edit: force-pushed the last two commits because of a typo ... |
Remove the requirement from custom_download_handler to download a file with exact length and to check for a slow retrieval attack. The needed file verification is already performed by TUF after the file download step. Signed-off-by: Teodora Sechkova <[email protected]>
Simplify the test implementation of a custom_download_handler to match the latest function description in Updater. Reduce the function parameters which now takes only a 'url' Signed-off-by: Teodora Sechkova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a reasonable addition, thanks @sechkova. I'm a bit worried about how complex/jumpy the download_target()
call stack is already and this addition doesn't help.
But I think we can clean that up a bit by collapsing some of the functions as a follow-on.
I propose we not merge this until it's clear whether we need it to support pip (see pypa/pip#9041) – does that seem reasonable?
A temprorary file object is created to store the contents of 'url'. | ||
|
||
<Returns> | ||
A temporary file object that points to the contents of 'url'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description of the return value feels a bit awkward, but I see you've copied it from tuf.download.safe_download()
I'm ok to merge it when there is a clear need of such complication. |
No news on this yet but I'll note that I'm collecting issues related to this in my pip fork with label 'download': https://github.com/jku/pip/labels/download: it's maybe worth noting that some of these issues will not be completely fixed by the PR here -- e.g. currently pip supports Some issues will need individual fixes (possibly just changes in pip or things like issue #1188) in addition to something like this PR... |
I've converted this PR to a draft as it's effectively a proof of concept. |
Fixes issue #: #1142
Description of the changes being introduced by the pull request:
Give client-side users the option to provide their own custom download function
performing the actual target files download.
The user has to ensure that the function has the following signature and functionality:
By default
tuf.download.safe_download
is still used.Please verify and check that the pull request fulfills the following
requirements: