-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 694: Abstract file upload mechanisms #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit! |
I'm planning on taking a closer look later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments, but overall the changes look good to me.
I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).
- The PEP specifies that the endpoint for PyPI will be
https://upload.pypi.org/2.0
. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the/2.0
part is somewhat confusing given the PEP also uses conneg). - The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
- The content type handling is kind of wonky I think. Much like PEP 691 the client can use the
Accept
header to request a particular content type from the server, and the server includes the full version number in themeta.api-version
key in the response. However, the requests appear to only be using themeta.api-version
key. I think ideally we want to have requests using a correctContent-Type
for their request (and thelatest
wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that themeta.api-version
matches theContent-Type
for major version.
While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2 |
Realization while specifying |
See: 924a27d |
See: f8469cf |
Co-authored-by: Donald Stufft <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bouncing between reading the PR preview and the diff. I'll capture some thoughts here based on text not touched in the PR (which I don't think GH gives me the UI to add inline comments to).
- Instead of "a standard API" I think we're now talking about "an extensible API" with some standard behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've pretty much run out of gas. I think some of my comments may not make a ton of sense since I reviewed it sequentially rather than reading the whole thing and then composing my feedback. Apologies for that.
Overall, I think this is a really good simplification, and I really like the direction it's going in. I know I have a lot of musings, feedback, thoughts, comments, and suggestions sprinkled throughout, and I hope they're moderately helpful.
If it would be helpful, I can try to edit the PR locally and push changes, or I could branch your PR and push a new branch/PR, or we can just try to make it all work here. Happy to also chat about it separately!
@@ -24,7 +24,7 @@ with standardization, the upload API provides additional useful features such as | |||
|
|||
* artifacts which can be overwritten and replaced, until a session is published; | |||
|
|||
* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth; | |||
* flexible file upload mechanisms for index operators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* flexible file upload mechanisms for index operators; | |
* flexible file upload mechanisms for index operators; |
- a protocol to extend the supported upload mechanisms in the future without requiring a full PEP; these can be standardized and recommended for all indexes, or be index-specific.
The new upload API proposed in this PEP solves all of these problems, providing for a much more | ||
flexible, bandwidth friendly approach, with better error reporting, a better release testing | ||
experience, and atomic and simultaneous publishing of all release artifacts. | ||
The new upload API proposed in this PEP provides a solution to all of these problems, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new upload API proposed in this PEP provides a solution to all of these problems, | |
The new upload API proposed in this PEP provides a solution to all of these problems, |
The new upload API proposed in this PEP provides an immediate solution to many of these problems, and defines a flexible mechanism for future support of the other problems by extension.
flexible, bandwidth friendly approach, with better error reporting, a better release testing | ||
experience, and atomic and simultaneous publishing of all release artifacts. | ||
The new upload API proposed in this PEP provides a solution to all of these problems, | ||
providing for a much more flexible approach, with support for servers to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providing for a much more flexible approach, with support for servers to | |
providing for a much more flexible approach, with support for servers to |
In the future, indexes can
experience, and atomic and simultaneous publishing of all release artifacts. | ||
The new upload API proposed in this PEP provides a solution to all of these problems, | ||
providing for a much more flexible approach, with support for servers to | ||
implement resumable and parallel uploads via mechanisms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implement resumable and parallel uploads via mechanisms, | |
implement resumable and parallel uploads via mechanisms, |
implement resumable and parallel uploads via extensions,
|
||
File Upload Session Completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where the response may be specific to the mechanism being used. I think I understand what you're getting at though. You want the client to signal to the server that it's done exchanging the file to the server. But that means that completing a file upload is a two step process:
- File is completely uploaded using whatever protocol is defined by the mechanism
- Client also has to signal to the server that the upload is completed.
This is rather than the mechanism itself communicating to the server that 2) has been completed. I'm guessing that a specific case you might be thinking about is S3 exchange.
In that case, the server says, hey you can use the S3 pre-signed URL protocol to upload your file. I don't know anything about those details, and in fact I'm out of the loop. You handle it, and then when you're done, you tell me you're done and I can do any post-upload processing I need to do. This is rather than setting up a way for S3 to tell PyPI that the upload is finished (e.g. through a webhook or some such).
Have I got that right, or are you thinking about something else?
|
||
.. code-block:: email | ||
Content-Type: application/vnd.pypi.upload.v2+json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
Once the client has retrieved the offset that they need to start from, they can upload the rest of | ||
the file as described above, either in a single request containing all of the remaining bytes, or in | ||
multiple chunks as per the above protocol. | ||
After receiving this requests the server **MAY** perform additional asynchronous processing on the file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm on the right track with my thinking about the intent above, then I think if we keep this "File Upload Session Completion" request, the server MUST respond with either a 200
or 202
in the success case (and of course appropriate error codes if a failure occurs). The server would respond with a 200
if the post-completion processing is done synchronously to the request. It would respond with a 202
if that processing must be done asynchronously (e.g. it would take a long time to verify a checksum or such). In the later case, there would have to be an endpoint that the client could poll to get the status of the post-processing.
One option there would be to just post the same request to the same endpoint but use "action": "status"
because I think if we make the file upload endpoint mechanism specific, we can't put it here.
File Upload Mechanisms | ||
---------------------- | ||
|
||
Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`. | |
Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`. |
Servers MUST implement :ref:required file upload mechanisms <required-file-upload-mechanisms>
.
|
||
A given server **MAY** implement an arbitrary number of server specific mechanisms | ||
and is responsible for documenting their usage. | ||
Server specific implementations **MUST** be prefixed with ``vnd-`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that I'm here (and admittedly, running out of gas -- it's EOW) I see that my earlier comment about the mechanism name of pypi-atomic
wouldn't be appropriate because it would be a required protocol and not a vendor protocol. So maybe that would be changed to atomic
instead.
This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.
The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.
📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/