Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Content-Encoding to ResourceTiming #381

Open
nicjansma opened this issue Sep 16, 2023 · 28 comments · May be fixed by #385
Open

Expose Content-Encoding to ResourceTiming #381

nicjansma opened this issue Sep 16, 2023 · 28 comments · May be fixed by #385

Comments

@nicjansma
Copy link
Contributor

Hi,

Similar to request #203 to expose Content-Type, I would like to request exposing the Content-Encoding of each resource to ResourceTiming.

As we're starting to see experimentation and deployments of new content encodings such as Zstandard (zstd) and compression dictionary transports (zstd-d and br-d), we are moving toward content being delivered from a large set of possible encodings, even to the same client on different page loads or (sub)requests to the same domain.

When the content encoded was a small set: (none), gzip and brotli, one could often infer the encoding depending on the encoded/decoded body sizes, though that generally only works if one "owns" the content (has visibility into what the size would be for each encoding type).

Having an explicit .contentEncoding would help with some use-cases I can think of:

  • As origins and CDNs want to experiment with new Content Encodings, they could utilize phased rollouts of the above technologies and be able to segment (from RUM) the performance of clients utilizing those new technologies.
    • For example, this can help with understanding real-world uptake %, how the performance behaves for those new technologies, and debugging them
  • Once fully deployed, origins and CDNs could segment browsing sessions (or A/B test) with those technologies to provide value confirmation and/or debugging via RUM.
    • For example, one could verify with RUM (along serverside logs) how often dictionaries are being used.
  • Third-party RUM vendors could segment resource fetches by their encoding with certainty (rather than guessing based on file sizes)
    • For example, RUM vendors could offer insights and suggestions to enable more advanced encodings and/or dictionaries

CC @pmeenan @horo-t

@yoavweiss
Copy link
Contributor

Similar to Content-Type, this probably needs to be restricted to content that is CORS-enabled or same-origin. That's not a limitation for the dictionary compression encodings (as they have similar restrictions). This might be a limitation for non-dictionary ztd/brotli.

@nicjansma
Copy link
Contributor Author

Discussed on the Feb 29, 2024 W3C WebPerf call:

Summary:

  • General agreement this would be useful for RUM providers
  • This is similar to work that has been merged in for Content-Type (behind a flag in Chromium)
  • Chrome net team will be picking up the spec work (in Fetch and ResourceTiming) to complete this and get it into Chromium

@horo-t
Copy link
Member

horo-t commented Mar 4, 2024

Filed chromium side bug. https://crbug.com/327941462

+CC: @Jxck who is interested in this.

@Jxck
Copy link

Jxck commented Mar 14, 2024

Thanks @horo-t I'm happy to work on it !
Let me check.

Jxck added a commit to Jxck/resource-timing that referenced this issue Mar 25, 2024
add `contentEncoding` to Resource Timing.
closed w3c#381.
@guohuideng2024
Copy link

@guohuideng2024
Copy link

guohuideng2024 commented Dec 6, 2024

I plan to filter the contentEncoding value before exposing it in resource timing. ( The unfiltered value in the response header remains for other uses), but how exactly the value should be filtered?

Continuing the discussing in whatwg/fetch#1742.

Below are the possible values that are registered. I am putting them in the following categories:

-- definitely allowed (confirmed that Chromium supports in WPT tests)
br
dcb
dcz
deflate
gzip
zstd

-- maybe allowed?
compress

-- What about these? Maybe not allow? why?

exi
identity
pack200-gzip
aes128gcm

-- The below are deprecated. Since we are building new feature, should we just disallow them( forcing the app to change to the up-to-date name)? Or should be be forgiving and transfer them to compress and gzip?

x-compress Deprecated (alias for compress)
x-gzip Deprecated (alias for gzip)

-- What about these? They are not registered contendEncoding values, should we disallow them?
br-d
zstd-d

++++++++++++++++
A disallowed value will be exposed as unknown
++++++++++++++++
Thanks!

@pmeenan
Copy link

pmeenan commented Dec 6, 2024 via email

@guohuideng2024
Copy link

guohuideng2024 commented Dec 9, 2024

Based on what Patrick said above and some digging, I think the following could be filtered out:

aes128gcm: looks like this is only used for "pushing messages", not for receiving resources via fetch
compress, exi, pack200-gzip: not supported in chromium
br-d, zstd-d, x-compress and x-zip: they are deprecated name
identity: it's just a reserved name

And the following will be supported:
br, dcb, dcz, deflate, gzip, zstd

(last updated 2024/12/11)

@nhelfman
Copy link

nhelfman commented Dec 15, 2024

I think we must also specify what would be the value in case there is no content encoding (i.e. uncompressed payload).
It should be differentiated from the unknown case so we can track these separately. An example use case is when a CDN process decides to no compress the payload due to some issue (a real scenario I had). It is important to being able to identify these cases.

Seems easiest to just not set the value in that case so that when the property is read it will be JS undefined. However, this will harm supportability discovery since it will not be possible to determine if there was no encoding or the feature is not supported / or allowed by the UA.

My suggestion is for a specific string value in that case:
undefined / uncompressed / no-encoding

Anyone has a preference or other suggestions?

@yoavweiss
Copy link
Contributor

yoavweiss commented Dec 15, 2024

"identity" seems appropriate IMO.

@guohuideng2024
Copy link

guohuideng2024 commented Dec 16, 2024

Thanks for pointing out the "identity" value meaning Yoav. I missed it. :)

And I think I could mention the "identity" value too in the fetch standard modification.
I am going to modify the CLs and PRs accordingly, and I will soon start reviewing process after that.

@noamr
Copy link
Contributor

noamr commented Jan 22, 2025

Do we need to decipher between "identity" and "no content-encoding header" (an empty string in contentEncoding)? Unlike unknown, those sound like the same thing to me from a client perspective?

@guohuideng2024
Copy link

guohuideng2024 commented Jan 22, 2025

@yoavweiss

Noam point out this spec:
https://httpwg.org/specs/rfc9110.html#rfc.section.8.4
actually says that identity shouldn't be allowed in response header:

"Note that the coding named "identity" is reserved for its special role in Accept-Encoding and thus SHOULD NOT be included."

If we cannot report identify value in resource timing, the client is not able to tell between:
a) no compression is used, and the server really likes to tell the client that no compression is used
(in this case, returning identify value is idea)
b) the server didn't update to new fetch standard so although it's using some compression, it doesn't tell the client.
(in this case, resourceTiming returning empty string)

So it appears to me that it's desirable that:
(This is actually what submitted to chromium)

  1. identity is allowed in fetch response, and such value is reported in resourceTiming
  2. if server doesn't send contentEncoding value, such field is empty in resourceTiming. It means the value is missing, and CDN statistics and analysis will treat them accordingly.

==>
Then we will need to make change to #https://httpwg.org/specs/rfc9110.html#rfc.section.8.4?

@noamr
Copy link
Contributor

noamr commented Jan 22, 2025

@yoavweiss

Noam point out this spec: https://httpwg.org/specs/rfc9110.html#rfc.section.8.4 actually says that identity shouldn't be allowed in response header:

"Note that the coding named "identity" is reserved for its special role in Accept-Encoding and thus SHOULD NOT be included."

If we cannot report identify value in resource timing, the client is not able to tell between: a) no compression is used, and the server really likes to tell the client that no compression is used (in this case, returning identify value is idea) b) the server didn't update to new fetch standard so although it's using some compression, it doesn't tell the client. (in this case, resourceTiming returning empty string)

What does it mean "The server didn't update to the new fetch standard"? Fetch is a client-side standards that works on top of HTTP. If the server didn't send content-encoding doesn't it effective means there is no compression, as in "it's sending the identity content"?

@LPardue
Copy link
Contributor

LPardue commented Jan 22, 2025

Im working on other things related to HTTP content-encoding. The feedback I've received is to NOT use identity as label for message content that does not have a coding.

RFC 9110 reflects the reality of use in the wild, that implementations SHOULD NOT include "identity" in Content-Encoding but they might.

Therefore, my suggestion would be to use a label like unencoded (or no-encoding or empty string, but NOT uncompressed l) when there is no Content-Encoding response header. Use identity if there really is a Content-Encoding: identiy response header field.

@noamr
Copy link
Contributor

noamr commented Jan 22, 2025

I believe empty-string is consistent with other no-value cases in the resource timing spec (e.g. deliveryType, nextHopProtocol). Perhaps that's good enough here? unknown can still be used as a catch-all for an encoding type that is not in the supported list of encodings by the UA.

@guohuideng2024
Copy link

Thanks for the input! So, the reality is that most of the servers just send empty string, instead of a literal string identity.

Let me list the possible situation here and could you guys please verify:

  1. if no header is sent at all --> return empty string
  2. if empty header is sent --> return empty string
  3. Otherwise, at least one coding is sent. A coding would be transferred to unknown if it's not recognized.

{gzip, apple, identity, pear } will be filtered to
{gzip, unknown, identity, unknown}
(because @LPardue said if the server did send identity we can keep it?)

or we filter it to
{gzip, unknown, unknown, unknown} with identity filtered too?

@noamr
Copy link
Contributor

noamr commented Jan 23, 2025

Thanks for the input! So, the reality is that most of the servers just send empty string, instead of a literal string identity.

Let me list the possible situation here and could you guys please verify:

  1. if no header is sent at all --> return empty string
  2. if empty header is sent --> return empty string
  3. Otherwise, at least one coding is sent. A coding would be transferred to unknown if it's not recognized.

{gzip, apple, identity, pear } will be filtered to {gzip, unknown, identity, unknown} (because @LPardue said if the server did send identity we can keep it?)

or we filter it to {gzip, unknown, unknown, unknown} with identity filtered too?

We don't want this to become a little side-channel where servers can encode information by sending mutltiple encodings over a 1-byte fetch or so.

So I think it should be:

  • limited to maximum 2 codings. more than that becomes multiple. (Can be 3 if there is a common use case?)
  • string separated rather than an array
  • if one of them is unknown, the whole string become unknown
  • identity turns to unknown
  • No-header or empty-string turns into the empty string

Examples: gzip, gzip deflate, br, unknown, multiple, "" etc.

@guohuideng2024
Copy link

guohuideng2024 commented Jan 23, 2025

Adding to what Noam said above:

  1. the filter results in a string nicely formatted, separated by a comma and a space, examples:

""
"gzip" // only one coding
"gzip, deflate" // two codings
"multiple" // 3 or more
"unknown" // there is something unrecognized

  1. empty string sent from server along with other coding values will be ignored

below is the conversion from the raw string from server, to the string reported in resourceTiming:

", gzip" ==> "gzip"
", gzip, deflate" ==> "gzip, deflate"

3 What if duplicated values from server?
(the server shouldn't do that but I guess we still need to define the rule here?)

"gzip, Gzip" => "gzip"?
"gzip, Gzip, deflate" => "gzip, deflate"?
"gzip, deflate, gzip" => "gzip, deflate" or "multiple"?

@noamr
Copy link
Contributor

noamr commented Jan 24, 2025

Adding to what Noam said above:
3 What if duplicated values from server? (the server shouldn't do that but I guess we still need to define the rule here?)

"gzip, Gzip" => "gzip"? "gzip, Gzip, deflate" => "gzip, deflate"? "gzip, deflate, gzip" => "gzip, deflate" or "multiple"?

To be more concrete, these are th values used in this header in November 2024 in more than 100 responses (HTTP archive, mobile, out of ~1.5M):

COUNT | encoding
-- | --
819,032,384 | <empty>
369,685,911 | gzip
359,144,061 | br
29,541,044 | zstd
107,020 | deflate
75,470 | none
29,740 | base64
19,778 | UTF-8
13,586 | identity
11,715 | utf-8
8,247 | webp
7,788 | dcb
3,907 | text
1,955 | GZIP
1,768 | 7bit
1,323 | nosniff
925 | ISO-8859-1
873 | x-gzip
732 | null
689 | utf8
639 | None
488 | binary
454 | image/webp
408 | gzip, gzip
295 | GP5
192 | br, gzip
187 | image/jpeg
134 | application/javascript
134 | compress
107 | Stream


Seems like using multiple values is very rare in practice. Perhaps supplying just unknoen, one value or multiple?

@yoavweiss
Copy link
Contributor

I can't think of a valid use case for multiple values. I actually now wonder what does the browser do with e.g. br, gzip.
The content is actually encoded with either one or the other.

Ideally, we'd report the value that was actually respected by the browser. Is that the first?

@noamr
Copy link
Contributor

noamr commented Jan 24, 2025

I can't think of a valid use case for multiple values. I actually now wonder what does the browser do with e.g. br, gzip. The content is actually encoded with either one or the other.

Doesn't it mean - compress with br, and then compress the result with gzip?

@yoavweiss
Copy link
Contributor

You're right! TIL

I still don't think it's a valid use case. A single value of "multiple" is sufficient.

@guohuideng2024
Copy link

guohuideng2024 commented Jan 24, 2025

That would be much easier to define for now.
And I will work on a CL to revise the current Chromium implementation to it.

One the other hand, I could speculate that in future "two compressions" could become more common, like a very specific compression used with a general compression technique (Maybe some dictionary compress followed by gzip or br) could be beneficial.
I am curious, when putting this to resourceTiming specification, is it possible that we leave some room so it's possible to extend in future? If yes, do we just change the specification in future (making it compatible with older version), or we just leave some room in the text we write right now?

@pmeenan
Copy link

pmeenan commented Jan 24, 2025

One the other hand, I could speculate that in future "two compressions" could become more common, like a very specific compression used with a general compression technique (Maybe some dictionary compress followed by gzip or br) could be beneficial.

I think even that would be an extreme edge case. For example, the current dictionary compressions are built directly into brotli and zstd (using dcb and dcz content encodings). Specialized compression is likely to be complete and layering them is usually a mistake or at least not helpful.

I'd prefer to keep it simple and just use multiple for that case, even going forward. That would provide enough information from the RUM case where someone could further dig-in and test if they aren't expecting to see it.

@guohuideng2024
Copy link

Thanks, and I think got the principle. And I think this "multiple" rule can take the highest precedence?
i.e., as long as there is a "comma" the result is "multiple".

Examples below are all to converted to "multiple"
", " not ==> ""
"gzip, gzip" not to ==> "gzip"
"unrecognized_compression, gzip" not to "unknown"

(Because this is the simplest and the very unlikely cases don't matter much.)

@pmeenan
Copy link

pmeenan commented Jan 24, 2025

Agreed. The comma means there are multiple passes at encoding, even if all of the passes are using the same encoding.

@yoavweiss
Copy link
Contributor

is it possible that we leave some room so it's possible to extend in future?

If in some unforeseeable future we'd find ourselves with a genuine case where it makes sense to apply multiple encodings, we can always decide then to add a value that represents those multiple encodings. I don't think that deciding to go with "multiple" now would have negative future-compat implications.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 3, 2025
The value from the http header can be of multiple codings, or not
properly formatted.

Per discussion on w3c/resource-timing#381,
multiple codings should be transformed to "multiple"; "identity" is
not allowed in response header; and the coding value should be
formatted if it's not in http header.

Bug: 327941462
Change-Id: I9048423c5ad562d8001562324cb35f72ef8ac5da
aarongable pushed a commit to chromium/chromium that referenced this issue Feb 3, 2025
The value from the http header can be of multiple codings, or not
properly formatted.

Per discussion on w3c/resource-timing#381,
multiple codings should be transformed to "multiple"; "identity" is
not allowed in response header; and the coding value should be
formatted if it's not in http header.

Bug: 327941462
Change-Id: I9048423c5ad562d8001562324cb35f72ef8ac5da
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6215331
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Commit-Queue: Guohui Deng <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415037}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 3, 2025
The value from the http header can be of multiple codings, or not
properly formatted.

Per discussion on w3c/resource-timing#381,
multiple codings should be transformed to "multiple"; "identity" is
not allowed in response header; and the coding value should be
formatted if it's not in http header.

Bug: 327941462
Change-Id: I9048423c5ad562d8001562324cb35f72ef8ac5da
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6215331
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Commit-Queue: Guohui Deng <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415037}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Feb 3, 2025
The value from the http header can be of multiple codings, or not
properly formatted.

Per discussion on w3c/resource-timing#381,
multiple codings should be transformed to "multiple"; "identity" is
not allowed in response header; and the coding value should be
formatted if it's not in http header.

Bug: 327941462
Change-Id: I9048423c5ad562d8001562324cb35f72ef8ac5da
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6215331
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Commit-Queue: Guohui Deng <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415037}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Feb 5, 2025
… ResourceTiming, a=testonly

Automatic update from web-platform-tests
Revise filtering for Content-Encoding in ResourceTiming

The value from the http header can be of multiple codings, or not
properly formatted.

Per discussion on w3c/resource-timing#381,
multiple codings should be transformed to "multiple"; "identity" is
not allowed in response header; and the coding value should be
formatted if it's not in http header.

Bug: 327941462
Change-Id: I9048423c5ad562d8001562324cb35f72ef8ac5da
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6215331
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Commit-Queue: Guohui Deng <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415037}

--

wpt-commits: 1691f567df269bca146cb8d25c43d644b2b42c63
wpt-pr: 50456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants