-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Close http client connection if handler doesn't consume all chunked-encoded body #3063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Close http client connection if handler doesn't consume all chunked-encoded body #3063
Conversation
This source currently can be asked to track the total number of consumed bytes. This is for the client to later compare this number with the response content_length and decide how to coninue with the connection. This patch changes this math to use only remaining bytes left for the source to read. It makes the source_impl simpler, the client deciding code simpler and prepares the ground for the chunked-encoded source to do the same check. Signed-off-by: Pavel Emelyanov <[email protected]>
Client uses this number to decide what to do with the connection after reply handler callback finihsed working with the response. For content-legnth encoded body, if the handler leaves some unread bytes (the remaining bytes counter is non-zero) the client either skips those bytes from the socket, if that number is low, os marks the connection for closing. For chunked-encoded body the very same check cannot be done, because the size of the body is not known up until the whole body is read. So to detect the case when handler leaves the reply body unread is done in yes/no manner -- the source sets the "remaining bytes" to non-zero value and only resets it to zero after all response is read. Similarly, to prevent the client from skipping the remaining bytes from the conneciton, its initial value is set to be infinite to be greater than any threshold that can be configured on the client. Signed-off-by: Pavel Emelyanov <[email protected]>
This is to make next patching a bit simpler Signed-off-by: Pavel Emelyanov <[email protected]>
When closing server-side connection it may happen that client had already closed its end and there's nothing to flush into. Neither there's anything that can be done about it. Ignoring exception here is safe, all the test wants to check is done on the client side, server just supplies the reply at its best. refs: 303d0cb Signed-off-by: Pavel Emelyanov <[email protected]>
Extend the test with partially-read body to verify the same for chunked-encoded replies. This consists of three changes: 1. In client response handler do not read response content_length 2. In server response writer write the response chunked encoded 3. In server always assume that client closes the partially-read connection Signed-off-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
3918566
to
f8e2d0f
Compare
upd:
|
I'm reviewing the actual code in your patch, but I want to make a more general comment: Clearly, the problem you are trying to solve is not specific only to the http client, and also affects, or affected in the past, the http server, so it's worth looking at how it was solved (or not) in the server. And we did solve it in the server using the following code in src/http/httpd.cc: // If the handler did not read the entire request
// content, this connection cannot be reused so we
// need to close it (via "_done = true"). But we can't
// just check content_stream.eof(): It may only become
// true after read(). Issue #907.
return content_stream.read().then([this] (temporary_buffer<char> buf) {
if (!buf.empty()) {
_done = true;
}
}); The way this works is that we just try to continue reading from the stream. If we read the entire content-length or chunked-encoding-length or whatever (it supports both!), we get an empty buffer, and everything is fine. If, however, it's still possible to read from the stream, it means the application didn't read all of it, and we need to close the connection. I don't know if this solution is perfect, but it's certainly much simpler and didn't require any changes to the input stream implementation and so on. |
input_stream<char>& _inp; | ||
size_t _remaining_bytes = 0; | ||
size_t* _consumed_bytes; | ||
size_t _builtin_remaining_bytes = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to understand, or even guess, how _builtin_remaining_bytes differ from _remaining_bytes, what "builtin" refers to. Please either reconsider the names, or add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code below, it seems you don't even use this _builtin_remaining_bytes. Can it just be removed entirely?
If not, I have no idea why not so a comment or rename is definitely needed. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is needed to make size_t& _remaining_bytes
reference something in case the class is constucted without "external" remaining bytes counter
// references to fields in the request structure | ||
std::unordered_map<sstring, sstring>& _chunk_extensions; | ||
std::unordered_map<sstring, sstring>& _trailing_headers; | ||
size_t& _remaining_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above commit message suggests this isn't remaining_bytes, it's more of a zero or one. So please add a comment explaining what it is, or even rename it to something like _has_remaining_bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caller (http client) passes size_t reply::left_content_length
as the target for that reference. The source needs to store here how many bytes are left unread, but since in chunked source it's not possible to find it out, it stores here either 0 (all read) or "infinity" (something is left to be read, but the exact amount is not known). I agree it's not extremely elegant, but how else can it be unified with content-length source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the client either skips those bytes from the socket
should be the client either drains those bytes from the socket
}; | ||
input_stream<char>& _inp; | ||
chunk_parser _chunk; | ||
size_t _builtin_remaining_bytes = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I have no idea what is "builtin" remaining bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When chunked download source is constructed without "external" size_t variable, the size_t& chunk_parser::remaining_bytes
needs to reference some variable, and this is the one. Reference can't be left uninitialized
|
||
SEASTAR_TEST_CASE(test_client_close_connection_content_length) { | ||
return test_client_close_connection(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that our client tests are in the file httpd_test.cc
(where the "d" means daemon - i.e., server). That's strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strage, yes :( The excuse for that is -- when http client appeared its tests re-used some code form http server test thus they were added to httpD file. Since then new tests just followed it. I will split this test file (or just rename it into http_test) as a followup
Yes, you're right. However, in client this handling works differently. In server the connection is closed even if there's a single byte left unread from the body. In client we wanted to keep connections alive harder, so in case body source reports that there's less than certain amount of bytes left to be read from the reply, the client just does so (reads those remaining bytes) but throws the buffers instantly and then returns the connection back to pool (#2762) |
content_length_source_impl(input_stream<char>& inp, size_t length, size_t* consumed_bytes) | ||
: _inp(inp), _remaining_bytes(length), _consumed_bytes(consumed_bytes) { | ||
content_length_source_impl(input_stream<char>& inp, size_t length, size_t& remaining) | ||
: _inp(inp), _remaining_bytes(remaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont follow here, why you are first initialize the reference in member initialization list and the update the value of the remaining
argument effectively overwriting whatever was stored there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the patch we have _remaining_bytes(length)
, now we need to effectively do the same
|
||
content_length_source_impl(input_stream<char>& inp, size_t length) | ||
: content_length_source_impl(inp, length, nullptr) { | ||
: _inp(inp), _builtin_remaining_bytes(length), _remaining_bytes(_builtin_remaining_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call the above constructor overload? something like content_length_source_impl(inp, length, _builtin_remaining_bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but debug clang build detected this as an error
return handle(rep, std::move(in)).then([this, reply = std::move(reply), &con] { | ||
if (reply->content_length > reply->consumed_content) { | ||
auto bytes_left = reply->content_length - reply->consumed_content; | ||
if (reply->left_content_length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implicit conversion from numeric to 'bool'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// references to fields in the request structure | ||
std::unordered_map<sstring, sstring>& _chunk_extensions; | ||
std::unordered_map<sstring, sstring>& _trailing_headers; | ||
size_t& _remaining_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the client either skips those bytes from the socket
should be the client either drains those bytes from the socket
} | ||
} | ||
out.close().get(); | ||
out.close().handle_exception([](auto ex){}).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it may happen that client had already closed its end and there's nothing to flush into" if it happens we will get some concrete error type/message, right? why not to check for it to prevent cases when something else is going wrong and we just ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe :( I forgot what exactly went wrong here, maybe this patch can be dropped. I'll revisit
c.close().get(); | ||
} | ||
|
||
future<> test_client_close_connection(bool chunked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "response writer write" should be "response writer writes" I guess
If when handling an http reply the handler doesn't consume all the body, the subsequent request shouldn't receive leftover data from the previous response and fail. For content-length replies this case is handled by client itself and there's a test that validates it. For chunked-encoded bodies, the problem is fixed here.
refs #2767
fixes #2768