-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat S3 Transfer Manager v2 GetObject/DownloadObject #2996
base: main
Are you sure you want to change the base?
Conversation
* recommit transfer manager v2 files * change pool to store slice pointer * add integ test for putobject * update go mod * minor changes for v0.1.0 * update tags * update tags * update integ test dependency version * change err var name * update go mod * change input/output type comment * minor change --------- Co-authored-by: Tianyi Wang <[email protected]> rebase from main rebase branch from main
move transfer manager v2 integration tests to within module move putobject integ test to transfer manager module
add getobject integ test
@@ -20,6 +20,8 @@ const defaultMultipartUploadThreshold = 1024 * 1024 * 16 | |||
// using PutObject(). | |||
const defaultTransferConcurrency = 5 | |||
|
|||
const defaultPartBodyMaxRetries = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this copied over from existing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the latest SEP doesn't have such retry limitation in detail
} | ||
|
||
// GetObjectInput represents a request to the GetObject() or DownloadObject() call. It contains common fields | ||
// of s3 GetObject input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems incomplete, of this and what else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing else, GetObjectInput
is the only s3 client input used here
for _, opt := range opts { | ||
opt(&i.options) | ||
} | ||
i.options.Concurrency = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting Concurrency to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetObject
here mimics s3.GetObject
to get an object's chunks and write to output body in sequence, thus there should be only one goroutine to write chunks to output in order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to benchmark it but I don't think we can do it this way. Doing multiple range GETs w/ concurrency 1 is going to be actively worse than just plain GetObject, in theory, because you're adding the overhead of multiple round-trips.
Ideally we dispatch concurrent range GETs and then we just arrange their output streams in sequence behind a single io.ReadCloser
to the user.
// Additional functional options can be provided to configure the individual | ||
// download. These options are copies of the original Options instance, the client of which DownloadObject is called from. | ||
// Modifying the options will not impact the original Client and Options instance. | ||
func (c *Client) DownloadObject(ctx context.Context, w io.WriterAt, input *GetObjectInput, opts ...func(*Options)) (*GetObjectOutput, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give this its own file.
|
||
// GetObjectOutput represents a response from GetObject() or DownloadObject() call. It contains common fields | ||
// of s3 GetObject output | ||
type GetObjectOutput struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't share these types between get and download. There's already divergence between what each one needs - as written, you have Body
in the output for DownloadObject, which is functionally useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the same is true for the input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we need io.WriterAt
in DownloadObjectInput
as dst, but not for GetObject
// Amazon S3 User Guide. | ||
// | ||
// [Checking object integrity]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html | ||
ChecksumSHA256 string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you actually verifying checksums in your GetObject implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the checksum validation is enabled by default unless user specify that in their input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation skip log also presented when multi-part download an large object uploaded without checksum
Implement v2 s3 transfer manager's GetObject/DownloadObject api bound to single union client which mimics normal service client's initialization and api call. User could now download object sequentially with GetObject or asynchronously with DownloadObject.
Test: passed unit test for GetObject and DownloadObject, passed integration test of uploading object via s3 client and validating its content after downloading through v2 transfer manager.