-
Notifications
You must be signed in to change notification settings - Fork 226
Provide file hashes in the URLs to avoid unnecessary file downloads (bandwidth saver) #1433
Changes from 3 commits
e7e9bce
713cfea
b3799a8
089e93d
a604491
984f7dd
9966096
8312f09
b65a945
1d11042
8895930
5d132b9
744503f
f7e6d7f
004fa70
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ | |
| from packaging.version import parse | ||
|
|
||
| import boto3 | ||
| from botocore.exceptions import NoCredentialsError | ||
|
|
||
|
|
||
| S3 = boto3.resource('s3') | ||
|
|
@@ -212,6 +213,23 @@ def normalize_package_version(self: S3IndexType, obj: str) -> str: | |
| def obj_to_package_name(self, obj: str) -> str: | ||
| return path.basename(obj).split('-', 1)[0] | ||
|
|
||
| def fetch_checksum_from_s3(self, s3_key): | ||
| s3_key = s3_key.replace("%2B", "+") | ||
| try: | ||
| response = CLIENT.get_object_attributes( | ||
| Bucket=BUCKET, | ||
| Key=s3_key, | ||
| ObjectAttributes=['Checksum'] | ||
| ) | ||
| checksum = response['Checksum']['ChecksumSHA256'] | ||
| return checksum | ||
| except NoCredentialsError: | ||
| print("No AWS credentials found") | ||
| return None | ||
|
||
| except Exception as e: | ||
|
||
| print(f"Unable to retrieve checksum due to {e}") | ||
| return None | ||
|
|
||
| def to_legacy_html( | ||
| self, | ||
| subdir: Optional[str]=None | ||
|
|
@@ -255,7 +273,11 @@ def to_simple_package_html( | |
| out.append(' <body>') | ||
| out.append(' <h1>Links for {}</h1>'.format(package_name.lower().replace("_","-"))) | ||
| for obj in sorted(self.gen_file_list(subdir, package_name)): | ||
| out.append(f' <a href="/{obj}">{path.basename(obj).replace("%2B","+")}</a><br/>') | ||
| checksum = self.fetch_checksum_from_s3(obj) | ||
| if checksum: | ||
| out.append(f' <a href="/{obj}#sha256={checksum}">{path.basename(obj).replace("%2B","+")}</a><br/>') | ||
| else: | ||
| out.append(f' <a href="/{obj}">{path.basename(obj).replace("%2B","+")}</a><br/>') | ||
matteius marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| # Adding html footer | ||
| out.append(' </body>') | ||
| out.append('</html>') | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to leverage the call in
from_S3to also include this information?If not, perhaps this should be done there so the returned objects already have the checksum attached and the downstream code can choose to use it or not?
That would isolate the S3 retrieval code to be in one logical place.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell they are different clients/endpoints that serve different purposes.
From my perspective, the
from_S3appears to just be tracking a list of strings which is the object s3_key withList[str]so I am not sure it makes sense to tackle a larger refactor just to add all the additional calls to pre-lookup these checksums and store them in memory until their point of use.Isn't it already isolated to this one file manage.py where the clients are in scope for the whole file?
Maybe I am being dense and don't see the obvious way to accomplish this without spending a couple more hours on it, plus the more complexity I add, well I have no real way of testing it. Looking for any guidance on the right exceptions to catch or not catch above, but I think we need to catch something to cover the case that s3 isn't ready or able to provide the hashes.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so looks like you should be able to do this in
from_s3(which is ideal, because we'll want the checksums for metadata files in #1457 as well:And then for an object without a checksum:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build is failing with:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1548
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @malfet again, as these inner comments are notoriously hard to discover. Hope it's no big deal 😇