-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are user initiated aborts a network failure? #108
Comments
I think the answer right now is "no" on the implementation side. My understanding of how this currently works in Chromium is that a "successful" report is generated (but most likely not sent, depending on the success sampling rate) if valid headers have been received and the network transaction is in an idle state [1]. If not, then the network transaction is silently destroyed without generating a NEL report. At the level at which we're generating NEL reports, I don't think there's a way to tell why exactly the transaction was cancelled, but we may be able to add code to cover more cases. (I may be misunderstanding how URLRequests/HttpNetworkTransactions work. @MattMenke2 is the expert here -- please correct me if I'm wrong.) [1] https://cs.chromium.org/chromium/src/net/http/http_network_transaction.cc?l=131 |
Thanks for the pointer and confirming the behavior Lily. @dcreager curious, have these scenarios come up in previous discussions? An example use case I'm thinking of here is user clicking on a link that's taking way too long to load (reasons unknown) and manually aborting the navigation — e.g. DNS resolved, TCP handshake completed but the server is non responsive and user runs out of patience, at which point they initiate a manual abort. In this case, from a site owners perspective.. I would argue that's a failed navigation and unless NEL emits a report, they're blind to it. |
It looks to me like we may or may not generate a success report on cancellation, depending on state of the HttpNetworkTransaction. If there's a pending body read, we don't generate a report. If there's no pending body read, we do generate a successful report. I assume there's generally a pending body read when a request is cancelled, unless we're actually stalled on something (auth challenge, client cert request, body IPC buffer is full, pending redirect not yet followed, etc). |
@MattMenke2 what are the prerequisites for transitioning to "pending body read" state? Does this imply we received the response headers? The types of failures I would be interested in here are...
|
It's only the case where we're streaming the body where we have a pending body read, so at that point we've read the final headers (after all redirects). Cancelling at earlier times will also not result in reporting error information, since we have a check in a destructor for having valid headers before putting together a report, if we haven't put one together already. |
Another fun case: Redirects. When we see a redirect, we follow the redirect, without trying to read the response body. If we're using H2/QUIC, we just cancel the stream we received the redirect response on. If it's HTTP/1.x, we try and drain the response body out of band, to try and reuse the socket. If draining the response body takes too long, we time out the connection. We can also get a network error while draining the body. In either case, we've already reported the HTTP headers, and followed the redirect. How should that fit into error reporting? Is it a success or a failure? |
Matt, thanks that's really helpful. The behavior you outlined addresses most of the cases I highlighted..
It's possible that a server flushes headers and then trickles out the response (some CDN's do and promote this as an optimization feature), but generally in my experience it's (unfortunately) not common for servers to flush headers well ahead of the response... Well, except we do use that strategy for Search and other Google products. Thinking out loud, would it be crazy to emit a success report IFF response body finished? Re, redirects: that's a great question. For same origin redirect chain I could see a case for either each redirect being treated as separate navigation, or as a single 'transaction' — i.e. if any fails we emit an error report. For cross-origin redirects, it seems that we ought to treat each origin separately. E.g. if user navigates to foo.com which then redirects them to bar.com, both foo.com and bar.com have different NEL policies and we should respect them. |
Not sure we want to introduce a concept of origin at that layer of the stack. Another thing to think about is proxies. If we have an error looking up the domain name of the proxy to use, establishing a connection to the proxy, or a proxy returns a funky result to a CONNECT request (auth challenge, or something other than the response indicating it has established a tunnel), should we report that? If there are multiple proxies in the proxy list, and the first fails, but the next in the list succeeds, should we report the initial failure? What if preconnects fail - there's no request in that case. What if we reject a pushed HTTP stream because we don't like it? (We reject pushed auth challenges, for instance). What if the server requests client auth? Is that an error? HTTP auth we seem to include in the error set, so I guess so? There's such a thing as "expected" errors whenever a socket is reused. If we reuse a socket (Or use a preconnected but stale socket), we sometimes get a connection close even because the server timed out the socket while we were keeping it around for reuse. In this case, we automatically resend the request. Should we log the original request, which might be an expected error, as an error? We do this both on close and on reset, actually - neither of these may actually reflect an error on the part of the server, and if the retry succeeds, we never surface the error page to the user. Another case: What if we see a redirect, and the request is aborted rather than following the redirect. In particuar, this can happen on CORS failures (Though the layer we're logging at doesn't know about CORS failures, and we can't really teach it about them). Another issue: We normally isolate CORS and uncredentialed / non-CORS requests. Is there any reason to separate out network errors by this classification? This is mostly to protect the site the request is being made to from third party sites making the request, and I don't really see any attacks that combining the two sorts of requests could surface, but I'm not really an expert in that space. |
And actually answering your question: "Thinking out loud, would it be crazy to emit a success report IFF response body finished?" I think this is the full list of caveats to that (most of these already mentioned):
|
Just landed https://crrev.com/c/1480195, so the behavior now is to generate an "abandoned" report if the request is cancelled without having previously generated either an error or success report. Unfortunately these will all show up as "application" phase, regardless of what phase the network transaction was in when it was cancelled. |
@chlily1 nice work! I think that resolves all the scenarios and edge cases @MattMenke2 highlighted above. However, it looks like it got reverted yesterday (doh), are you planning to take another run at it? |
Yes, once I fix a bug that it caused. |
@chlily1 any updates on this one? Looks like the original CL is still reverted, or is there a new one I should follow? |
Woops, sorry for forgetting to update this thread. It was relanded as https://crrev.com/c/1491905. |
\o/ ... awesome work, thanks Lily! Based on the branch point, we should see this rollout in M74 stable? On a related note, do we have any WPT tests to verify this behavior? /cc @yoavweiss |
Yes, this is in M74. We don't currently have any wpt tests for this. |
Would be good to add WPTs, and I think WebDriver can help us simulate user initiated aborts. |
This is great because it brings the implementation closer to the spec. Question though: what about cancelled network requests that are not user initiated aborts? For example, some JS code adds an element to the DOM, sets its src and the browser initiates fetch of the resource. Next, JS code removes this element from the DOM and the browser cancels the resource fetch that is still in flight. Also, @chlily1 , I'm confused by the following I read at https://crrev.com/c/1480195 :
Does this mean that the browser will send a success report after receiving valid headers with response code 4xx or 5xx? If so: should a resource fetch with response code 4xx or 5xx not always be network error of type http.error ? |
Sorry for unclear wording. It is a success in the sense that the error code reported is net::OK. That is converted later into the type http.error after looking at the status code. |
Application initiated aborts are out of scope for NEL — they're not errors. |
Got it, although I believe one can argue that user initiated aborts are also not errors. This said, the NEL report will clearly show what caused the 'error' (it was a user initiated abort) so the user of the NEL report cannot be confused about what happened, and the report will include Just my 2 cents here. |
We shouldn't fire errors for subresource fetches aborted by the application or as a result of user action — if that's not clear in the spec, we should clarify it. Both of these cases are WAI and can be tracked by the application itself. We make one distinction for navigations+user abort because the page cannot, by definition, observe such aborted navigations and remains blind to potential issues. |
We don't have APIs to tell the network service the reason a request was cancelled. If we want to distinguish cancellation reasons, I don't think we want to tell the URLRequest itself the reason for cancellation, but rather have a separate API to explicitly tell NEL about the abort. Also, the renderer doesn't know the reason for the cancellation of a partially received main frame body, so I'm not sure we can really distinguish between reasons for cancelling a partially rendered body (user abort vs web-initiated, like document.location in the middle of a page load - I assume we only cancel the body load when the new navigation commits, though I could be wrong). |
More clarity in the spec is welcome ! |
Is it possible for the NEL report to make clear during which phase the abort happened? E.g. my server is overloaded and can't handle new TCP connections. |
The user clicks on a blue link, which kicks off the navigation request. After some time delta, the user's patience runs out and they..
If the origin being navigated to has an associated NEL policy, would the origin see a report for either or both of these cases? Per our current definition in the spec, assuming the UA did not get a 2XX or 3XX response and user aborts the navigation.. I believe the answer is "yes", but want to sanity check both from spec and implementation perspective.
I guess the meta question here is: how do we treat user initiated aborts?
The text was updated successfully, but these errors were encountered: