-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle errors in different phases differently #83
Conversation
This patch clarifies the different phases that happen while servicing a request. A network error occurs during one of those phases, and we now handle errors differently depending on which phase they occur in: - `include_subdomains` policies can only be used to generate reports about DNS resolution errors, since the policy author can only confirm ownership of the DNS tree, and not of all of the servers that those domain names resolve to. - If the resolved IP address for an origin changes between when a policy is received, and when its used to generate a report, we don't report any details about the connection and application phases, and only report that the IP address changed. This prevents DNS rebinding attacks.
<p> | ||
Regardless of which fetch algorithm and which underlying application and | ||
transport protocols are used, servicing a <a>network request</a> consists | ||
of the following <dfn data-lt="phase">phases</dfn>: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love if these phases were defined in a more central place, like Fetch. That'd make it more likely that specs like the Web Packaging specs remember to define which phase they're operating in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an editor's note that we'd like to move these into Fetch. I'd like to keep the definitions here so that they're available somewhere until that happens.
<dt><a>received IP address</a></dt> | ||
<dd> | ||
the IP address of the <a>server</a> that the user agent received | ||
<var>response</var> from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that response is defined by reference to RFC7230 instead of Fetch. That's probably the wrong choice, particularly because you can pretty easily extend Fetch's data type if you need extra data stored in it.
In this case, I think you need to add the server IP address to https://fetch.spec.whatwg.org/#concept-connection and a way to get the connection for a https://fetch.spec.whatwg.org/#concept-response. @annevk, does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the lower-level references so that it's hopefully more clear how a non-browser user agent would use NEL — for instance, a native app using OkHttp to make API requests. That's also the rationale behind the particular wording I chose when I introduced phases earlier on. I see what you're saying though — jumping through hoops to make this more generic might not be worth the trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto — I've updated the reference so that it points to Fetch intstead of RFC 7230, but otherwise I've just added an editor's note that we want to plumb this through more explicitly in Fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The HTTP RFCs are currently up for revision in https://github.com/httpwg/http-core#draft-http-core-documents, so it is possible to plumb things through there instead. I suspect that non-browser agents should try to sit on Fetch, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
index.html
Outdated
@@ -812,6 +891,33 @@ <h2>Generate a network error report</h2> | |||
</dl> | |||
</li> | |||
|
|||
<li> | |||
If <var>request body</var>'s <code>server_ip</code> property is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/request body/report body/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
index.html
Outdated
|
||
<ol> | ||
<li> | ||
Set <var>request body</var>'s <code>phase</code> to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might express this as completely overwriting report body instead of clearing particular fields. That way, if someone in the future adds sensitive fields but forgets to update this step, the algorithm will fail closed instead of open. (This might just be temporary paranoia on my part though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now some text down in Privacy Considerations that calls out what kind of information can be included in a report, so instead of replacing report body I added a step that user agents must clear out any fields that are derived from information that isn't available during DNS resolution.
@igrigorik @yoavweiss, this could use another set of eyes. I've tried to summarize all of the relevant discussion from #74 in the Examples and Privacy Considerations sections. |
<dt><a>received IP address</a></dt> | ||
<dd> | ||
the IP address of the <a>server</a> that the user agent received | ||
<var>response</var> from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The HTTP RFCs are currently up for revision in https://github.com/httpwg/http-core#draft-http-core-documents, so it is possible to plumb things through there instead. I suspect that non-browser agents should try to sit on Fetch, though.
index.html
Outdated
<code>elapsed_time</code> properties. | ||
</li> | ||
<li> | ||
If the user agent has added any additional fields to <var>report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your inclusion of a rationale for this trimming fixes the problem I was worried about where we might update the spec and forget to list new fields here. However, I don't think this step can replace the explicit list of fields to clear: it just signals to someone reading the spec that they ought to file a spec bug if we miss something.
For that purpose, maybe an Assert is the best approach? You'd write something like
Assert: All fields in report body that are derived from information not available during DNS resolution have been cleared.
Domenic, Anne, or Ryan should feel free to override me on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed this to an assert. (The step just before this one already still lists status_code
and elapsed_time
explicitly as fields that need to be cleared.)
index.html
Outdated
</p> | ||
|
||
<p> | ||
To prevent information leakage, NEL reports about a <a>request</a> MUST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware "MUST"s in security and privacy implications. Since they don't appear in the algorithms that have to actually make them happen, it's easy for implementers to miss them. That said, I don't have a good guideline for what to do instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these statements should be implied by the algorithms earlier in the document, so I changed the wording so that this doesn't imply that it's adding additional requirements, just elaborating on the rationale for the existing ones.
* gh-pages: Update WICG references to W3C (w3c#87) Adding baseline CODE_OF_CONDUCT.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second @jyasskin's comments about Fetch integration. Not sure this should be a blocker though.
* gh-pages: Fix typo
</li> | ||
|
||
<li> | ||
<dfn>Secure connection establishment</dfn>: The user agent opens a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we distinguish TCP from TLS here? Would also be good to properly define what happen with those phases in protocols which don't include them (TCP-Fast-Open + TLS/1.3 0-RTT, QUIC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make this not too granular, exactly because of the different protocols and fast paths that there are. I didn't want this to be "a taxonomy of connection methods as of the time of this writing"; instead I wanted "in the broadest possible strokes, here's what happens when the user agent makes a request over the network". So I tried to merge phases together as much as I could.
These three seemed like the minimum — I couldn't glom any more of them together. dns
has to be separate because it's treated differently, both for downgraded reports and for the handling of include_subdomains
. application
has to be separate because it's the only mandatory phase. It didn't seem necessary to split apart the connection
phase — at least here in NEL — because it wouldn't have any impact on whether we're allowed to collect a report, or on what information that report could contain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I can see use cases for distinguishing between them on the collector side (e.g. distinguishing between "host is down" to "cert issues"). At the same time, not sure how important that distinction is, so fine with leaving it out for now. I also agree that in a world where everything is QUIC, that distinction may not make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still all of the tcp.*
and tls.*
error codes, so the collector will still be able to see a fine-grained description of what error occurred. It's just that all of those error codes would have the same value (connection
) for their phase
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK. missed that
<li> | ||
If <var>report body</var>'s <code>server_ip</code> property is | ||
non-empty, and not equal to <var>policy</var>'s <a>received IP | ||
address</a>: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reasoning behind this section is to avoid sending reports to addresses that didn't send the original policy? (so "downgrading" the reports, in a sense)
If so, can you add a note on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly! There is some text down in the privacy § describing the rationale but I added a note here, too, so that there's an explanation close to the algorithm step.
index.html
Outdated
@@ -713,7 +803,7 @@ <h2>Generate a network error report</h2> | |||
policy</a>, and queues it for delivery. | |||
</p> | |||
|
|||
<ol> | |||
<ol class="algorithm"> | |||
|
|||
<li> | |||
If the result of executing the <a>is-origin-trustworthy</a> algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"potentially trustworthy" would've been more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step requires the result to be "Potentially Trustworthy". Do you mean simplifying the text to just:
If request's origin is not potentially trustworthy
and not mentioning the name of the algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the algorithm is "is origin potentially trustworthy?", despite the fact that its link drops the "potentially" part: https://w3c.github.io/webappsec-secure-contexts/#is-origin-trustworthy
I think it'd be better to change the <dfn>
to indicate the full name. Might be better to do that as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and did it here.
index.html
Outdated
|
||
<p> | ||
To mitigate some of the above risks, NEL registration is restricted to | ||
<a>trustworthy origins</a>, and delivery of network error reports is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"potentially trustworthy" would've been more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
index.html
Outdated
</li> | ||
|
||
<li> | ||
If <var>policy</var>'s <a>subdomains</a> flag is <code>include</code>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"included"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include
is consistent with the rest of the text. I'm happy to change this if you feel strongly, but I'd prefer to do that in a separate PR to ensure that I get all of them.
index.html
Outdated
<a>trustworthy origins</a>, and delivery of network error reports is | ||
similarly restricted to <a>trustworthy origins</a>. This disallows a | ||
transient HTTP MITM from trivially abusing NEL as a persistent | ||
tracker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, that means that NEL policies can effectively act as third party cookies, when delivered on third party responses. Is that correct?
If so, might be worth while to mention that and indicate that:
- The policy cache should be purged when cookies are purged
- Third party policies should be treated as third party cookies (and e.g. not be saved when the user preferences prevent third party cookies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a NEL policy can act as a first-party cookie, but not as a third-party one. (Unless I'm misunderstanding the threat you're suggesting.) NEL policies can only be set by the origin that the requests will be sent to; this ¶ describes how we try to enforce that. i.e. a NEL policy can't be delivered on a third-party response — the policy doesn't contain an explicit origin describing what it should apply to, it implicitly applies to the origin of the response itself.
Re clearing the policy cache, there's an existing ¶ at the bottom of the section that requires that.
Thanks for following up on the various comments. Good to merge from my perspective :) |
Thanks Yoav! |
* gh-pages: Clean up network error reports section (w3c#89) NEL reports are not observable (w3c#77) Handle errors in different phases differently (w3c#83) update repo config Fix typo Update WICG references to W3C (w3c#87) Adding baseline CODE_OF_CONDUCT.md Factor out a Concepts section (w3c#82) Add request method to report body (w3c#80) Update examples to latest JSON schema (w3c#81)
This patch clarifies the different phases that happen while servicing a request. A network error occurs during one of those phases, and we now handle errors differently depending on which phase they occur in:
include_subdomains
policies can only be used to generate reports about DNS resolution errors, since the policy author can only confirm ownership of the DNS tree, and not of all of the servers that those domain names resolve to.If the resolved IP address for an origin changes between when a policy is received, and when its used to generate a report, we don't report any details about the
connection
andapplication
phases, and only report that the IP address changed. This prevents DNS rebinding attacks.