Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors in different phases differently #83

Merged
merged 9 commits into from
Jul 10, 2018

Conversation

dcreager
Copy link
Member

This patch clarifies the different phases that happen while servicing a request. A network error occurs during one of those phases, and we now handle errors differently depending on which phase they occur in:

  • include_subdomains policies can only be used to generate reports about DNS resolution errors, since the policy author can only confirm ownership of the DNS tree, and not of all of the servers that those domain names resolve to.

  • If the resolved IP address for an origin changes between when a policy is received, and when its used to generate a report, we don't report any details about the connection and application phases, and only report that the IP address changed. This prevents DNS rebinding attacks.

This patch clarifies the different phases that happen while servicing a
request.  A network error occurs during one of those phases, and we now
handle errors differently depending on which phase they occur in:

  - `include_subdomains` policies can only be used to generate reports
    about DNS resolution errors, since the policy author can only
    confirm ownership of the DNS tree, and not of all of the servers
    that those domain names resolve to.

  - If the resolved IP address for an origin changes between when a
    policy is received, and when its used to generate a report, we don't
    report any details about the connection and application phases, and
    only report that the IP address changed.  This prevents DNS
    rebinding attacks.
@dcreager dcreager changed the base branch from concepts to gh-pages July 2, 2018 16:19
<p>
Regardless of which fetch algorithm and which underlying application and
transport protocols are used, servicing a <a>network request</a> consists
of the following <dfn data-lt="phase">phases</dfn>:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love if these phases were defined in a more central place, like Fetch. That'd make it more likely that specs like the Web Packaging specs remember to define which phase they're operating in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an editor's note that we'd like to move these into Fetch. I'd like to keep the definitions here so that they're available somewhere until that happens.

<dt><a>received IP address</a></dt>
<dd>
the IP address of the <a>server</a> that the user agent received
<var>response</var> from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that response is defined by reference to RFC7230 instead of Fetch. That's probably the wrong choice, particularly because you can pretty easily extend Fetch's data type if you need extra data stored in it.

In this case, I think you need to add the server IP address to https://fetch.spec.whatwg.org/#concept-connection and a way to get the connection for a https://fetch.spec.whatwg.org/#concept-response. @annevk, does that sound right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the lower-level references so that it's hopefully more clear how a non-browser user agent would use NEL — for instance, a native app using OkHttp to make API requests. That's also the rationale behind the particular wording I chose when I introduced phases earlier on. I see what you're saying though — jumping through hoops to make this more generic might not be worth the trouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto — I've updated the reference so that it points to Fetch intstead of RFC 7230, but otherwise I've just added an editor's note that we want to plumb this through more explicitly in Fetch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The HTTP RFCs are currently up for revision in https://github.com/httpwg/http-core#draft-http-core-documents, so it is possible to plumb things through there instead. I suspect that non-browser agents should try to sit on Fetch, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

index.html Outdated
@@ -812,6 +891,33 @@ <h2>Generate a network error report</h2>
</dl>
</li>

<li>
If <var>request body</var>'s <code>server_ip</code> property is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/request body/report body/?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

index.html Outdated

<ol>
<li>
Set <var>request body</var>'s <code>phase</code> to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might express this as completely overwriting report body instead of clearing particular fields. That way, if someone in the future adds sensitive fields but forgets to update this step, the algorithm will fail closed instead of open. (This might just be temporary paranoia on my part though.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now some text down in Privacy Considerations that calls out what kind of information can be included in a report, so instead of replacing report body I added a step that user agents must clear out any fields that are derived from information that isn't available during DNS resolution.

@dcreager dcreager requested review from igrigorik and yoavweiss July 3, 2018 15:30
@dcreager
Copy link
Member Author

dcreager commented Jul 3, 2018

@igrigorik @yoavweiss, this could use another set of eyes. I've tried to summarize all of the relevant discussion from #74 in the Examples and Privacy Considerations sections.

<dt><a>received IP address</a></dt>
<dd>
the IP address of the <a>server</a> that the user agent received
<var>response</var> from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The HTTP RFCs are currently up for revision in https://github.com/httpwg/http-core#draft-http-core-documents, so it is possible to plumb things through there instead. I suspect that non-browser agents should try to sit on Fetch, though.

index.html Outdated
<code>elapsed_time</code> properties.
</li>
<li>
If the user agent has added any additional fields to <var>report
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your inclusion of a rationale for this trimming fixes the problem I was worried about where we might update the spec and forget to list new fields here. However, I don't think this step can replace the explicit list of fields to clear: it just signals to someone reading the spec that they ought to file a spec bug if we miss something.

For that purpose, maybe an Assert is the best approach? You'd write something like

Assert: All fields in report body that are derived from information not available during DNS resolution have been cleared.

Domenic, Anne, or Ryan should feel free to override me on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed this to an assert. (The step just before this one already still lists status_code and elapsed_time explicitly as fields that need to be cleared.)

index.html Outdated
</p>

<p>
To prevent information leakage, NEL reports about a <a>request</a> MUST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware "MUST"s in security and privacy implications. Since they don't appear in the algorithms that have to actually make them happen, it's easy for implementers to miss them. That said, I don't have a good guideline for what to do instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these statements should be implied by the algorithms earlier in the document, so I changed the wording so that this doesn't imply that it's adding additional requirements, just elaborating on the rationale for the existing ones.

Douglas Creager added 2 commits July 7, 2018 09:41
* gh-pages:
  Update WICG references to W3C (w3c#87)
  Adding baseline CODE_OF_CONDUCT.md
Copy link
Contributor

@yoavweiss yoavweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @jyasskin's comments about Fetch integration. Not sure this should be a blocker though.

* gh-pages:
  Fix typo
</li>

<li>
<dfn>Secure connection establishment</dfn>: The user agent opens a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we distinguish TCP from TLS here? Would also be good to properly define what happen with those phases in protocols which don't include them (TCP-Fast-Open + TLS/1.3 0-RTT, QUIC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this not too granular, exactly because of the different protocols and fast paths that there are. I didn't want this to be "a taxonomy of connection methods as of the time of this writing"; instead I wanted "in the broadest possible strokes, here's what happens when the user agent makes a request over the network". So I tried to merge phases together as much as I could.

These three seemed like the minimum — I couldn't glom any more of them together. dns has to be separate because it's treated differently, both for downgraded reports and for the handling of include_subdomains. application has to be separate because it's the only mandatory phase. It didn't seem necessary to split apart the connection phase — at least here in NEL — because it wouldn't have any impact on whether we're allowed to collect a report, or on what information that report could contain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I can see use cases for distinguishing between them on the collector side (e.g. distinguishing between "host is down" to "cert issues"). At the same time, not sure how important that distinction is, so fine with leaving it out for now. I also agree that in a world where everything is QUIC, that distinction may not make much sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still all of the tcp.* and tls.* error codes, so the collector will still be able to see a fine-grained description of what error occurred. It's just that all of those error codes would have the same value (connection) for their phase field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK. missed that

<li>
If <var>report body</var>'s <code>server_ip</code> property is
non-empty, and not equal to <var>policy</var>'s <a>received IP
address</a>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reasoning behind this section is to avoid sending reports to addresses that didn't send the original policy? (so "downgrading" the reports, in a sense)
If so, can you add a note on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! There is some text down in the privacy § describing the rationale but I added a note here, too, so that there's an explanation close to the algorithm step.

index.html Outdated
@@ -713,7 +803,7 @@ <h2>Generate a network error report</h2>
policy</a>, and queues it for delivery.
</p>

<ol>
<ol class="algorithm">

<li>
If the result of executing the <a>is-origin-trustworthy</a> algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"potentially trustworthy" would've been more appropriate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step requires the result to be "Potentially Trustworthy". Do you mean simplifying the text to just:

If request's origin is not potentially trustworthy

and not mentioning the name of the algorithm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the algorithm is "is origin potentially trustworthy?", despite the fact that its link drops the "potentially" part: https://w3c.github.io/webappsec-secure-contexts/#is-origin-trustworthy

I think it'd be better to change the <dfn> to indicate the full name. Might be better to do that as a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and did it here.

index.html Outdated

<p>
To mitigate some of the above risks, NEL registration is restricted to
<a>trustworthy origins</a>, and delivery of network error reports is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"potentially trustworthy" would've been more appropriate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

index.html Outdated
</li>

<li>
If <var>policy</var>'s <a>subdomains</a> flag is <code>include</code>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"included"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include is consistent with the rest of the text. I'm happy to change this if you feel strongly, but I'd prefer to do that in a separate PR to ensure that I get all of them.

index.html Outdated
<a>trustworthy origins</a>, and delivery of network error reports is
similarly restricted to <a>trustworthy origins</a>. This disallows a
transient HTTP MITM from trivially abusing NEL as a persistent
tracker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, that means that NEL policies can effectively act as third party cookies, when delivered on third party responses. Is that correct?

If so, might be worth while to mention that and indicate that:

  • The policy cache should be purged when cookies are purged
  • Third party policies should be treated as third party cookies (and e.g. not be saved when the user preferences prevent third party cookies)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a NEL policy can act as a first-party cookie, but not as a third-party one. (Unless I'm misunderstanding the threat you're suggesting.) NEL policies can only be set by the origin that the requests will be sent to; this ¶ describes how we try to enforce that. i.e. a NEL policy can't be delivered on a third-party response — the policy doesn't contain an explicit origin describing what it should apply to, it implicitly applies to the origin of the response itself.

Re clearing the policy cache, there's an existing ¶ at the bottom of the section that requires that.

@yoavweiss
Copy link
Contributor

Thanks for following up on the various comments. Good to merge from my perspective :)

@dcreager
Copy link
Member Author

Thanks Yoav!

@dcreager dcreager merged commit e707a78 into w3c:gh-pages Jul 10, 2018
@dcreager dcreager deleted the phases branch July 10, 2018 13:55
@dcreager dcreager mentioned this pull request Jul 10, 2018
dcreager pushed a commit to dcreager/network-error-logging that referenced this pull request Jul 17, 2018
* gh-pages:
  Clean up network error reports section (w3c#89)
  NEL reports are not observable (w3c#77)
  Handle errors in different phases differently (w3c#83)
  update repo config
  Fix typo
  Update WICG references to W3C (w3c#87)
  Adding baseline CODE_OF_CONDUCT.md
  Factor out a Concepts section (w3c#82)
  Add request method to report body (w3c#80)
  Update examples to latest JSON schema (w3c#81)
This was referenced Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants