-
Notifications
You must be signed in to change notification settings - Fork 278
client refactor: simplify mirrors loop #1352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client refactor: simplify mirrors loop #1352
Conversation
One of the reasons for using a file object (a streaming object if you will) is that we have not needed to read the file in memory in all cases. With metadata that's a little moot as we do have to read it to verify and process... but with targets not loading the whole file is a very good idea. So I think the the file object commit is not good. This also means the updater_rework _check_hashes() is wrong. |
This is not really a review (sorry) but maybe background for future discussion: Moving the download loop back to updater looks like the right move (in other words my original idea was bad): I can understand this code :) . However I'm not sure the idea of separate loops is better than what the current/old updater does. The download loop is not trivial (we've had multiple bugs with file objects etc) so is multiplying it going to help? I think the underlying idea with your design is that
This is valid... but I think the cost of duplicating the download code is too high -- this is visible in root download where you already had to refactor that one to another function. I think having a generic metadata download function is fine we just have to figure out how to make it clear what is verified when: I'm hoping #1313 will help there -- my vision of what would happen is something like this:
Another observation:
|
if self._metadata["root"].keys( | ||
"timestamp" | ||
) != intermediate_root.keys("timestamp"): | ||
# FIXME: use abstract storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use TODO instead of FIXME?
I did a quick search in the tuf repo and found only 1 place in the code, besides this file, where we are using those.
It seems we aren't using FIXME in our code and if (hopefully) one day we decide to fix our TODOs this will be left out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to use abstract storage in the client (or make it a priority, even if Metadata supports it) but as this part of code is hopefully going to move to bundle anyway, I'm not looking at this sort of things too closely now
I was going to write the same thing and then I saw your comment. |
6561f7a
to
660eb25
Compare
Agree that now when mirror download code is moved back to Updater the code is more readable but also makes the repetition obvious. However replacing it with a single function with the current state of the Updater code does not result in the desired clean implementation. I propose keeping this changes as a base to test the work proposed in #1313, see how it fits in and at the same time reduce the code repetition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previous comment still stands
I think the cost of duplicating the download code is too high
... but we have a plan:
- get this code into a readable form again (this PR) even if it keeps duplicating a lot of code
- reduce support for mirrors (as it really is what generates all of this complexity, and no-one needs most of it) : Teodora is going to document that plan in Split Updater into logical components: redesign mirrors.py and download.py #1307
- after that we will hopefully be able to simplify this code drastically
- at this point also metadata bundle can be simplified experimental client: Add MetadataBundle #1355
After these steps we will be able to evaluate whether the branch has value or if we should just consider it lessons learned and start over...
So with this idea in mind I've reviewed again: I think I found one actual issue in target_download and then some comments...
if temp_obj: | ||
temp_obj.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this applies to every case with close(): current client does temp_obj = None
here (inside the if-clause) and it makes sense I think: otherwise temp_obj might be defined on next iteration even if it should not be, leading to us calling close twice...
Calling close() twice should not break things so this probably does not fail anything but it looks fishy
In fact now that you've made the metadata handling bytes-based (and not file object), all of the metadata download loops could just use a download method that returns bytes -- but it does not need to happen in this PR. Maybe adding temp_obj = None is more straight forward for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, thanks for noticing. I added temp_obj = None
for now, since changing the download function would mean a difference between metadata and targets. I'll keep this in mind.
tuf/client_rework/updater_rework.py
Outdated
except Exception as exception: | ||
if temp_obj: | ||
temp_obj.close() | ||
raise exceptions.NoWorkingMirrorError({file_mirror: exception}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should store the exception like the other functions, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed, thanks
tuf/client_rework/updater_rework.py
Outdated
|
||
# Raise an exception if any of the hashes are incorrect. | ||
if trusted_hash != computed_hash: | ||
raise sslib_exceptions.BadHashError(trusted_hash, computed_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a BadHashError of our own and should not raise sslib errors
Oops my "fixup" commit is not signed off ... but I apparently want to rebase and squash this one after your review. |
Looks good enough for me -- let's get this merged so we can see if streamlining mirrors helps... I'll merge develop into experimental-client now (#1365): After that if you'd like to rebase this on experimental-client (to get the linter fix) and also rebase --autosquash (to squash the fixup commit), that works for me. |
mirror_target_download and mirror_meta_download functions that return iterators are causing more troubles than good. Since there is a need to download and verify each file inside the mirrors loop (before continuing to the next mirror), feels like the correct component to do this is the Updater itself. This means slightly repetitive code inside the Updater class but with the benefit of better readability and less bug-prone implementation. Signed-off-by: Teodora Sechkova <[email protected]>
Current implementation does not bring much benefit and can be replaced with a simple f-string. Signed-off-by: Teodora Sechkova <[email protected]>
By replacing file objects passed as function arguments with the read file content, we simplify temporary file objects life cycle management. Temporary files are handled in a single function. This is done for metadata files, which are fully read into memory right after download, anyway. Same is not true for target files which preferably should be treated in chunks so targets download and verification still deal with file objects. _check_hashes is split in two functions, one dealing correctly with file objects and one using directly file content. Signed-off-by: Teodora Sechkova <[email protected]>
Replace sslib exceptions raised in Updater with the corresponding ones defined in tuf. Signed-off-by: Teodora Sechkova <[email protected]>
Thanks, it should be ready now. |
Addresses: #1318, #1339
Description of the changes being introduced by the pull request:
This set of changes tries to simplify the code and make it easy to follow as a result it:
Please verify and check that the pull request fulfills the following
requirements: