Skip to content

PoC: Let user download targets #1171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

sechkova
Copy link
Contributor

@sechkova sechkova commented Oct 8, 2020

Fixes issue #: #1142

Description of the changes being introduced by the pull request:

Give client-side users the option to provide their own custom download function
performing the actual target files download.

The user has to ensure that the function has the following signature and functionality:


def download_handler_func(url, required_length)
""" 
  Given the 'url' and 'required_length' of the desired file, open a
  connection to 'url', download it, and return the contents of the file.
  Also ensure the length of the downloaded file matches 'required_length'
  exactly
"""

By default tuf.download.safe_download is still used.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
Good job noticing that _get_file doesn't need the download_safely argument.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @sechkova, especially the removal of unused code!

I would definitely like to see some tests for this custom_download_handler. Both testing a custom handler works and also ensuring we error sensibly if an incorrect handler is passed.

I've commented inline regarding my desire to encode the expected API of the custom handler and do some sanity checks that the API is adhered to, let me know what you think about those ideas.

Remove the boolean switch to toggle safe or unsafe download
in Updater._get_file since the method is private and is always
called with the download_safely option enabled.

Signed-off-by: Teodora Sechkova <[email protected]>
Give client-side users the option to provide their own custom
download function which should comply with the specification.

By default tuf.download.safe_download is used.

Signed-off-by: Teodora Sechkova <[email protected]>
Add to the download_target docstring the exceptions
that custom_download_handler implementation must raise.

Signed-off-by: Teodora Sechkova <[email protected]>
Add a test case to test_updater which tests incorrect
custom_download_handler implementations.

Signed-off-by: Teodora Sechkova <[email protected]>
@sechkova
Copy link
Contributor Author

I pushed two new commits that only add tests for the new custom_download_handler and extend the docs with expected exceptions (the rebase was rather unintentional ... there is no change in the old commits)

About encoding an API, given the dynamic nature of Python, I see two possible approaches:

  1. Using the inspect module to dynamically check the handler signature against a predefined one or against some expected parameters.
  • In practice this will allow us to raise an exception with a custom description when calling download_target if the handler has incorrect parameters.
  • This option does not require big changes of the code.
  1. Defining an interface by and Abstract Base Class.
  • In this case the user must implement a custom download method by inheriting from a base class. This will enforce the function signature or an exception will be raised during the class instantiation. At least I don't know a way to enforce rather than recommend by design the types of exceptions raised.
  • Requires some changes of the current client code: reformatting the download module as a class and defining an ABC as interface.

I actually prefer none of the above 🤷

I don't see a big added value in 1) in comparison with the current behavior. Currently a TypeError will be raised when calling the handler with incorrect parameters and will be re-raised as tuf.exceptions.NoWorkingMirrorError: with the original TypeError added to mirror_errors.

I agree that option 2) would be the neater one. But given the current code and the plans to abandon it, is it worth formally defining an interface which the client can brake in practice and we can actually only "recommend it by design"?

If I am missing a clever python way to achieve this, please, I will be glad to learn about it!
In any case, let me know your opinion.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks very reasonable. I would like to see an actual custom handler in the test -- it might show any undocumented requirements (like the fact that the handler probably really should return a TemporaryFile() -- other things probably don't make sense)


# Test: normal case.
self.repository_updater.download_target(targetinfo, destination_directory,
custom_download_handler=tuf.download.safe_download)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a lot of work to actually make a working custom handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried! Here it is: ca6f280
What I did is a very rudimentary implementation of the logic in tuf.download (skipping all extra checks, logging, etc).
I am hoping that this will help demonstrating what tuf does without the need to follow all the details.
Not sure if this was your expectation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is good. I don't think it needs to demonstrate what tuf does: it needs to demonstrate how a custom handler could be written -- and it does that. I have opinions about including SlowRetrievalError but since it relates to the larger issue of what we would require from the handler, I'll write that in a separate comment.

Comment on lines 3264 to 3270
Raise 'securesystemslib.exceptions.DownloadLengthMismatchError', if
STRICT_REQUIRED_LENGTH is True and total_downloaded is not equal
required_length.

Raise 'tuf.exceptions.SlowRetrievalError', if the total downloaded was
done in less than the acceptable download speed (as set in
tuf.settings.py).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last paragraph does not make sense IMO: if the whole file has already been downloaded, what's the point of raising?

Anyway, I think slow retrieval and even the above length mismatch error should be optional: the implementation may raise these things in these conditions. Requiring things that even TUF does not successfully do is a bit much I think :)

on the length check: TUF should check the length after the handler in any case (because it can) whether the handlers raising is documented as optional or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we do check the length after the handler is finished (in verify_target_file()) but please double check :)

Copy link
Contributor Author

@sechkova sechkova Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said below, this is copied from the current documentation and I am open for discussion on the exact formulation.

Related to the length verification, I checked ... 🤷


  def _hard_check_file_length(self, file_object, trusted_file_length):
    """
    <Purpose>
      Non-public method that ensures the length of 'file_object' is strictly
      equal to 'trusted_file_length'.  This is a deliberately redundant
      implementation designed to complement
      tuf.download._check_downloaded_length().

Comment on lines 3259 to 3260
'Given the 'url' and 'required_length' of the desired file, open a
connection to 'url', download it, and return the contents of the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "contents of the file" means a file object, that should be explicit. It might even be useful to mention that it really should be a TemporaryFile() -- nothing else really makes sense.

Maybe makes sense to document the handler in the same way TUF functions are documented (even if the format is a bit weird)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docstring as you suggest (aa3a635 ). The text is copied from the current tuf.download module documentation and I don't insist on this content. It is what currently TUF claims doing.

Add a detailed documentation of custom_download_handler.

Signed-off-by: Teodora Sechkova <[email protected]>
Add a test implementation of a custom_download_handler
mimicking the tuf.download call stack.

Signed-off-by: Teodora Sechkova <[email protected]>
tuf.settings.py).

<Returns>
A temporay file object that points to the contents of 'url'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A temporay file object that points to the contents of 'url'.
A temporary file object that points to the contents of 'url'.

@jku
Copy link
Member

jku commented Oct 26, 2020

I'l let a maintainer make a call but I'll document my opinion on SlowRetrieval with regards to this PR:

  • I believe the slow retrieval attack prevention in TUF does not really work (*)
  • We should not require a custom handler to implement something we don't know how to do properly ourselves
  • We should not provide a bad example of the slow retrieval prevention in the test -- it's complexity without benefit

*) I've not tried to implement this but here are the numbers that lead me to believe the code is not useful:

  • chunk size is a massive 400000 bytes: this means any checks are only done every 400000 bytes -- even if the expected file size is 10 bytes
  • sending 1 byte every 3 seconds would avoid triggering requests socket timeout
  • sending 1 byte every 3 seconds 400000 times takes almost 14 days

The client notices that the server is doing a slow retrieval attack two weeks after the attack started.

@joshuagl joshuagl self-requested a review October 26, 2020 20:56
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jku's assertion that we should not require a custom handler to implement slow retrieval protection. It's not currently part of the spec and we most likely won't re-add it until we can specify how we expect implementations to, well, implement it.

Let's go ahead and remove the expectation for the custom handler to throw DownloadLengthMismatchError (as it's handled elsewhere) and SlowRetrievalError (as above) and simplify the tests by removing those too.


else:
file_object = tuf.download.unsafe_download(file_mirror, file_length)
# Eensure the length of the downloaded file matches 'file_length'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Eensure the length of the downloaded file matches 'file_length'
# Ensure the length of the downloaded file matches 'file_length'

else:
return temp_file

#Test passing a custom download function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#Test passing a custom download function
# Test passing a custom download function

Comment on lines 3273 to 3274
DownloadLengthMismatchError, if there was a
mismatch of observed vs expected lengths while downloading the file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DownloadLengthMismatchError is thrown after the download handler, or safe_download(), call in _get_file() by _hard_check_file_length() (we also have an unused, except by tests, _soft_check_file_length() 🤦 ).

I don't think we need handlers to throw this. Better that we check lengths in the framework and leave the check as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the "deliberately redundant implementation designed to complement tuf.download._check_downloaded_length()"
#1171 (comment)

Seems like it can nicely complement custom_download_handler too without the redundancy part!

@sechkova
Copy link
Contributor Author

sechkova commented Nov 3, 2020

Pushed two new commits simplifying the expectation for the custom handler and the tests.
Now an external download handler should perform only the download and the file length is of the returned temporary file is verified by TUF in _hard_check_file_length().

I did not amend the old commits to make the discussion easier to follow but I will probably want to squash/reorder some commits before a final merge.

Edit: force-pushed the last two commits because of a typo ...

Remove the requirement from custom_download_handler to
download a file with exact length and to check for a
slow retrieval attack.

The needed file verification is already performed by
TUF after the file download step.

Signed-off-by: Teodora Sechkova <[email protected]>
Simplify the test implementation of a custom_download_handler
to match the latest function description in Updater.

Reduce the function parameters which now takes only a 'url'

Signed-off-by: Teodora Sechkova <[email protected]>
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a reasonable addition, thanks @sechkova. I'm a bit worried about how complex/jumpy the download_target() call stack is already and this addition doesn't help.

But I think we can clean that up a bit by collapsing some of the functions as a follow-on.

I propose we not merge this until it's clear whether we need it to support pip (see pypa/pip#9041) – does that seem reasonable?

A temprorary file object is created to store the contents of 'url'.

<Returns>
A temporary file object that points to the contents of 'url'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description of the return value feels a bit awkward, but I see you've copied it from tuf.download.safe_download()

@sechkova
Copy link
Contributor Author

sechkova commented Nov 5, 2020

This looks like a reasonable addition, thanks @sechkova. I'm a bit worried about how complex/jumpy the download_target() call stack is already and this addition doesn't help.

But I think we can clean that up a bit by collapsing some of the functions as a follow-on.

I propose we not merge this until it's clear whether we need it to support pip (see pypa/pip#9041) – does that seem reasonable?

I'm ok to merge it when there is a clear need of such complication.

@jku
Copy link
Member

jku commented Nov 9, 2020

I propose we not merge this until it's clear whether we need it to support pip (see pypa/pip#9041) – does that seem reasonable?

No news on this yet but I'll note that I'm collecting issues related to this in my pip fork with label 'download': https://github.com/jku/pip/labels/download: it's maybe worth noting that some of these issues will not be completely fixed by the PR here -- e.g. currently pip supports --proxy to set a proxy server and since current PR still leaves tuf in control of the metadata downloads, the proxy setting would not be used for the metadata downloads.

Some issues will need individual fixes (possibly just changes in pip or things like issue #1188) in addition to something like this PR...

@joshuagl joshuagl marked this pull request as draft November 26, 2020 16:25
@joshuagl
Copy link
Member

I've converted this PR to a draft as it's effectively a proof of concept.

@joshuagl joshuagl changed the title Let user download targets PoC: Let user download targets Nov 26, 2020
@sechkova
Copy link
Contributor Author

sechkova commented Dec 22, 2020

Closing after the discussions in #1213 and opening of #1250.

@sechkova sechkova closed this Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants