Skip to content
This repository has been archived by the owner on Apr 21, 2021. It is now read-only.

port to boto3 #11

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

port to boto3 #11

wants to merge 5 commits into from

Conversation

grahame
Copy link

@grahame grahame commented Aug 8, 2016

This patch addresses some issues encountered using ckanext-s3filestore in production.

  1. extremely large uploads (in ap-southeast-2, > around 8GB) would timeout. this was tracked down to a bug in boto, which has been fixed in boto3 without being backported (uploading file to s3 results in error 104 (connection reset)  boto/boto#2207). boto3 has support for multipart, streaming uploads, so I've moved the codebase to boto3.
  2. downloads used an amount of RAM equivalent to the size of the file. They were being downloaded from s3 into a string buffer, then served back via paste's DataApp. For large files, this was obviously not a good approach. I've replaced this with boto3's streaming download, and written a new paste App to serve that data out, based on FileApp. It supports range requests and continues.

I haven't updated the tests yet as I can't figure out how to run them - if you could let me know whether this PR is likely to be merged, I'll update the tests.

 - streaming downloads, including from multi-part files
 - streaming uploads, without file size limit (multi-part upload supported)
@nibecker
Copy link

@grahame great work! Thank you very much for fixing these issues. I hope we will have time to review and test it soon. I'll let you know!

@grahame
Copy link
Author

grahame commented Mar 21, 2017

@nibecker hey, just prodding for a review on this (I know I need to fix the tests up, but just of the basic concept would be good.) Have been running this on prod, uploading and downloading terabytes of data, so I'm pretty sure it's solid and could be merged.

return bucket
session = boto3.Session(aws_access_key_id=p_key, aws_secret_access_key=s_key)
s3 = session.resource('s3')
return s3.Bucket(bucket_name)

def upload_to_key(self, filepath, upload_file, make_public=False):
Copy link
Contributor

@dumyan dumyan Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any separation between public and private files (eg group image files should be public and at least the private resources should be made private).

From my initial reading, I'm under the impression that this will depend on the bucket policies and all uploaded files will have those permissions. @grahame am I correct?

If so, everything being publicly accessible from the S3 links can be a security issue here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants