PEP 694: Abstract file upload mechanisms #4431

ewdurbin · 2025-05-21T20:02:19Z

This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.

The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.

📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/

ewdurbin · 2025-05-22T11:49:07Z

I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit!

ewdurbin · 2025-05-28T14:14:22Z

@dstufft @warsaw I think this is ready for a proper review.

peps/pep-0694.rst

warsaw · 2025-05-28T16:44:50Z

I'm planning on taking a closer look later today.

peps/pep-0694.rst

dstufft

Couple comments, but overall the changes look good to me.

I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).

The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).
The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

peps/pep-0694.rst

ewdurbin · 2025-05-30T10:37:49Z

The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.

While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2

… in this pep

ewdurbin · 2025-05-30T11:05:00Z

Realization while specifying http-post-application-octet-stream: we need to support PEP 740 style attestations... tagging @woodruffw for thoughts :) See: 2ef077c

…file upload session

ewdurbin · 2025-05-30T12:09:57Z

The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

See: 924a27d

ewdurbin · 2025-05-30T12:11:53Z

The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what its endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).

See: f8469cf

peps/pep-0694.rst

Co-authored-by: Donald Stufft <[email protected]>

warsaw · 2025-05-30T22:15:52Z

peps/pep-0694.rst

I'm bouncing between reading the PR preview and the diff. I'll capture some thoughts here based on text not touched in the PR (which I don't think GH gives me the UI to add inline comments to).

Instead of "a standard API" I think we're now talking about "an extensible API" with some standard behavior.

Okay, I've pretty much run out of gas. I think some of my comments may not make a ton of sense since I reviewed it sequentially rather than reading the whole thing and then composing my feedback. Apologies for that.

Overall, I think this is a really good simplification, and I really like the direction it's going in. I know I have a lot of musings, feedback, thoughts, comments, and suggestions sprinkled throughout, and I hope they're moderately helpful.

If it would be helpful, I can try to edit the PR locally and push changes, or I could branch your PR and push a new branch/PR, or we can just try to make it all work here. Happy to also chat about it separately!

warsaw · 2025-05-30T22:18:01Z

peps/pep-0694.rst

@@ -24,7 +24,7 @@ with standardization, the upload API provides additional useful features such as

 * artifacts which can be overwritten and replaced, until a session is published;

-* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth;
+* flexible file upload mechanisms for index operators;


Suggested change

* flexible file upload mechanisms for index operators;

* flexible file upload mechanisms for index operators;

a protocol to extend the supported upload mechanisms in the future without requiring a full PEP; these can be standardized and recommended for all indexes, or be index-specific.

warsaw · 2025-05-30T22:24:14Z

peps/pep-0694.rst

-The new upload API proposed in this PEP solves all of these problems, providing for a much more
-flexible, bandwidth friendly approach, with better error reporting, a better release testing
-experience, and atomic and simultaneous publishing of all release artifacts.
+The new upload API proposed in this PEP provides a solution to all of these problems,


Suggested change

The new upload API proposed in this PEP provides a solution to all of these problems,

The new upload API proposed in this PEP provides a solution to all of these problems,

The new upload API proposed in this PEP provides an immediate solution to many of these problems, and defines a flexible mechanism for future support of the other problems by extension.

warsaw · 2025-05-30T22:25:09Z

peps/pep-0694.rst

-flexible, bandwidth friendly approach, with better error reporting, a better release testing
-experience, and atomic and simultaneous publishing of all release artifacts.
+The new upload API proposed in this PEP provides a solution to all of these problems,
+providing for a much more flexible approach, with support for servers to


Suggested change

providing for a much more flexible approach, with support for servers to

providing for a much more flexible approach, with support for servers to

In the future, indexes can

warsaw · 2025-05-30T22:25:34Z

peps/pep-0694.rst

-experience, and atomic and simultaneous publishing of all release artifacts.
+The new upload API proposed in this PEP provides a solution to all of these problems,
+providing for a much more flexible approach, with support for servers to
+implement resumable and parallel uploads via mechanisms,


Suggested change

implement resumable and parallel uploads via mechanisms,

implement resumable and parallel uploads via mechanisms,

implement resumable and parallel uploads via extensions,

warsaw · 2025-05-31T00:58:25Z

peps/pep-0694.rst


+File Upload Session Completion


This is another case where the response may be specific to the mechanism being used. I think I understand what you're getting at though. You want the client to signal to the server that it's done exchanging the file to the server. But that means that completing a file upload is a two step process:

File is completely uploaded using whatever protocol is defined by the mechanism

Client also has to signal to the server that the upload is completed.

This is rather than the mechanism itself communicating to the server that 2) has been completed. I'm guessing that a specific case you might be thinking about is S3 exchange.

In that case, the server says, hey you can use the S3 pre-signed URL protocol to upload your file. I don't know anything about those details, and in fact I'm out of the loop. You handle it, and then when you're done, you tell me you're done and I can do any post-upload processing I need to do. This is rather than setting up a way for S3 to tell PyPI that the upload is finished (e.g. through a webhook or some such).

Have I got that right, or are you thinking about something else?

warsaw · 2025-05-31T01:04:04Z

peps/pep-0694.rst


-.. code-block:: email
+    Content-Type: application/vnd.pypi.upload.v2+json


warsaw · 2025-05-31T01:08:28Z

peps/pep-0694.rst

-Once the client has retrieved the offset that they need to start from, they can upload the rest of
-the file as described above, either in a single request containing all of the remaining bytes, or in
-multiple chunks as per the above protocol.
+After receiving this requests the server **MAY** perform additional asynchronous processing on the file,


If I'm on the right track with my thinking about the intent above, then I think if we keep this "File Upload Session Completion" request, the server MUST respond with either a 200 or 202 in the success case (and of course appropriate error codes if a failure occurs). The server would respond with a 200 if the post-completion processing is done synchronously to the request. It would respond with a 202 if that processing must be done asynchronously (e.g. it would take a long time to verify a checksum or such). In the later case, there would have to be an endpoint that the client could poll to get the status of the post-processing.

One option there would be to just post the same request to the same endpoint but use "action": "status" because I think if we make the file upload endpoint mechanism specific, we can't put it here.

warsaw · 2025-05-31T01:11:34Z

peps/pep-0694.rst

+File Upload Mechanisms
+----------------------
+
+Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`.


Suggested change

Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`.

Servers **MUST** implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`.

Servers MUST implement :ref:required file upload mechanisms <required-file-upload-mechanisms>.

warsaw · 2025-05-31T01:19:54Z

peps/pep-0694.rst

+
+A given server **MAY** implement an arbitrary number of server specific mechanisms
+and is responsible for documenting their usage.
+Server specific implementations **MUST** be prefixed with ``vnd-``


Ah, now that I'm here (and admittedly, running out of gas -- it's EOW) I see that my earlier comment about the mechanism name of pypi-atomic wouldn't be appropriate because it would be a required protocol and not a vendor protocol. So maybe that would be changed to atomic instead.

ewdurbin added 3 commits May 21, 2025 16:00

PEP 694: Abstract file upload mechanisms

d569693

lint

8e98c9b

no warnings allowed

daf06f4

ewdurbin mentioned this pull request May 22, 2025

Demonstration finite state machine for PEP 694 pypi/warehouse#18174

Draft

refinements

5ef177e

ewdurbin marked this pull request as ready for review May 28, 2025 14:13

ewdurbin requested review from dstufft and warsaw as code owners May 28, 2025 14:13

konstin reviewed May 28, 2025

View reviewed changes

peps/pep-0694.rst Outdated Show resolved Hide resolved

ewdurbin commented May 28, 2025

View reviewed changes

peps/pep-0694.rst Outdated Show resolved Hide resolved

Update peps/pep-0694.rst

7f1e741

dstufft approved these changes May 29, 2025

View reviewed changes

peps/pep-0694.rst Outdated Show resolved Hide resolved

peps/pep-0694.rst Outdated Show resolved Hide resolved

ewdurbin added 2 commits May 30, 2025 06:39

remove reference to a specific URL for PyPI

f8469cf

clarify that resumable/parallel uploads are supported but not defined…

05d7fc2

… in this pep

ewdurbin added 4 commits May 30, 2025 07:33

attempt to specify http-post-application-octet-stream mechanism

2ef077c

if attestations are going to be uploaded, do it before completion of …

a2bfb56

…file upload session

i'm in it now

442d37f

try to un-wonk content-type per feedback

924a27d

lint

7f0798b

clarify file upload mechanism details

3effee2

dstufft approved these changes May 30, 2025

View reviewed changes

peps/pep-0694.rst Outdated Show resolved Hide resolved

Fix typo

c053242

Co-authored-by: Donald Stufft <[email protected]>

warsaw reviewed May 31, 2025

View reviewed changes

	* flexible file upload mechanisms for index operators;
	* flexible file upload mechanisms for index operators;

	The new upload API proposed in this PEP provides a solution to all of these problems,
	The new upload API proposed in this PEP provides a solution to all of these problems,

	providing for a much more flexible approach, with support for servers to
	providing for a much more flexible approach, with support for servers to

	implement resumable and parallel uploads via mechanisms,
	implement resumable and parallel uploads via mechanisms,


		.. code-block:: email
		Content-Type: application/vnd.pypi.upload.v2+json

	Servers MUST implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`.
	Servers MUST implement :ref:`required file upload mechansisms <required-file-upload-mechanisms>`.

Uh oh!

PEP 694: Abstract file upload mechanisms #4431

Are you sure you want to change the base?

PEP 694: Abstract file upload mechanisms #4431

Conversation

ewdurbin commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewdurbin commented May 22, 2025

Uh oh!

ewdurbin commented May 28, 2025

Uh oh!

Uh oh!

warsaw commented May 28, 2025

Uh oh!

Uh oh!

dstufft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ewdurbin commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewdurbin commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewdurbin commented May 30, 2025

Uh oh!

ewdurbin commented May 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ewdurbin commented May 21, 2025 •

edited by github-actions bot

Loading

ewdurbin commented May 30, 2025 •

edited

Loading

ewdurbin commented May 30, 2025 •

edited

Loading