Skip to content

Commit bab6a55

Browse files
committed
Summary of changes in this PR:
* Added a new global `set_connection_limit!` function for controlling the global connection limit that will be applied to all requests This is one way to resolve #1033. I added a deprecation warning when passing `connect_limit` to individual requests. So usage would be: calling `HTTP.set_connection_limit!` and any time this is called, it changes the global value. * Add a try-finally in keepalive! around our global IO lock usage just for good house-keeping * Refactored `try_with_timeout` to use a `Channel` instead of the non-threaded `@async`; it's much simpler, seems cleaner, and allows us to avoid the usage of `@async` when not needed. Note that I included a change in StreamRequest.jl however that wraps all the actual write/read IO operations in a `fetch(@async dostuff())` because this will currently prevent code in this task from migrating across threads, which is important for OpenSSL usage where error handling is done per-thread. I don't love the solution, but it seems ok for now. * I refactored a few of the stream IO functions so that we always know the number of bytes downloaded, whether in memory or written to an IO, so we can log them and use them in verbose logging to give bit-rate calculations * Ok, the big one: I rewrote the internal implementation of ConnectionPool.ConnectionPools.Pod `acquire`/`release` functions; under really heavy workloads, there was a ton of contention on the Pod lock. I also observed at least one "hang" where GDB backtraces seemed to indicate that somehow a task failed/died/hung while trying to make a new connection _while holding the Pod lock_, which then meant that no other requests could ever make progress. The new implementation includes a lock-free "fastpath" where an existing connection that can be re-used doesn't require any lock-taking. It uses a lock-free concurrent Stack implementation copied from JuliaConcurrent/ConcurrentCollections.jl ( doesn't seem actively maintained and it's not much code, so just copied). The rest of the `acquire`/`release` code is now modeled after Base.Event in how releasing always acquires the lock and slow-path acquires also take the lock to ensure fairness and no deadlocks. I've included some benchmark results on a variety of heavy workloads [here](https://everlasting-mahogany-a5f.notion.site/Issue-heavy-load-perf-degradation-1cd275c75037481a9cd6378b8303cfb3) that show some great improvements, a bulk of which are attributable to reducing contention when acquiring/releasing connections during requests. The other key change included in this rewrite is that we ensure we _do not_ hold any locks while _making new connections_ to avoid the possibility of the lock ever getting "stuck", and because it's not necessary: the pod is in charge of just keeping track of numbers and doesn't need to worry about whether the connection was actually made yet or not (if it fails, it will be immediately released back and retried). Overall, the code is also _much_ simpler, which I think is a huge win, because the old code was always pretty scary to have to dig into. * Added a new `logerrors::Bool=false` keyword arg that allows doing `@error` logs on errors that may otherwise be "swallowed" when doing retries; it can be helpful to sometimes be able to at least see what kinds of errors are happening * Added lots of metrics around various time spent in various layers, read vs. write durations, etc.
1 parent 75c1b25 commit bab6a55

32 files changed

+429
-644
lines changed

Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ version = "1.7.4"
66
[deps]
77
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
88
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
9+
ConcurrentUtilities = "f0e56b4a-5159-44fe-b623-3e5288b988bb"
910
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
1011
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
1112
LoggingExtras = "e6f89c97-d47a-5376-807f-9c37f3926c36"

docs/src/client.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,18 +91,26 @@ If you're running into a real head-scratcher, don't hesitate to [open an issue](
9191

9292
When a connection is attempted to a remote host, sometimes the connection is unable to be established for whatever reason. Passing a non-zero `connect_timetout` value will cause `HTTP.request` to wait that many seconds before giving up and throwing an error.
9393

94-
#### `connection_limit`
94+
#### `pool`
9595

96-
Many remote web services/APIs have rate limits or throttling in place to avoid bad actors from abusing their service. They may prevent too many requests over a time period or they may prevent too many connections being simultaneously open from the same client. By default, when `HTTP.request` opens a remote connection, it remembers the exact host:port combination and will keep the connection open to be reused by subsequent requests to the same host:port. The `connection_limit` keyword argument controls how many concurrent connections are allowed to a single remote host.
96+
Many remote web services/APIs have rate limits or throttling in place to avoid bad actors from abusing their service. They may prevent too many requests over a time period or they may prevent too many connections being simultaneously open from the same client. By default, when `HTTP.request` opens a remote connection, it remembers the exact host:port combination and will keep the connection open to be reused by subsequent requests to the same host:port. The `pool` keyword argument specifies a specific `HTTP.Pool` object to be used for controlling the maximum number of concurrent connections allowed to be happening across the pool. It's constructed via `HTTP.Pool(; max::Int)`. Requests attempted when the maximum is already hit will block until previous requests finish. The `idle_timeout` keyword argument can be passed to `HTTP.request` to control how long it's been since a connection was lasted used in order to be considered 'valid'; otherwise, "stale" connections will be discarded.
9797

9898
#### `readtimeout`
9999

100-
After a connection is established and a request is sent, a response is expected. If a non-zero value is passed to the `readtimeout` keyword argument, `HTTP.request` will wait to receive a response that many seconds before throwing an error. Note that for chunked or streaming responses, each chunk/packet of bytes received causes the timeout to reset. Passing `readtimeout = 0` disables any timeout checking and is the default.
100+
After a connection is established and a request is sent, a response is expected. If a non-zero value is passed to the `readtimeout` keyword argument, `HTTP.request` will wait to receive a response that many seconds before throwing an error. Passing `readtimeout = 0` disables any timeout checking and is the default.
101101

102102
### `status_exception`
103103

104104
When a non-2XX HTTP status code is received in a response, this is meant to convey some error condition. 3XX responses typically deal with "redirects" where the request should actually try a different url (these are followed automatically by default in `HTTP.request`, though up to a limit; see [`redirect`](@ref)). 4XX status codes typically mean the remote server thinks something is wrong in how the request is made. 5XX typically mean something went wrong on the server-side when responding. By default, as mentioned previously, `HTTP.request` will attempt to follow redirect responses, and retry "retryable" requests (where the status code and original request method allow). If, after redirects/retries, a response still has a non-2XX response code, the default behavior is to throw an `HTTP.StatusError` exception to signal that the request didn't succeed. This behavior can be disabled by passing `status_exception=false`, where the `HTTP.Response` object will be returned with the non-2XX status code intact.
105105

106+
### `logerrors`
107+
108+
If `true`, `HTTP.StatusError`, `HTTP.TimeoutError`, `HTTP.IOError`, and `HTTP.ConnectError` will be logged via `@error` as they happen, regardless of whether the request is then retried or not. Useful for debugging or monitoring requests where there's worry of certain errors happening but ignored because of retries.
109+
110+
### `observelayers`
111+
112+
If `true`, enables the `HTTP.observe_layer` to wrap each client-side "layer" to track the amount of time spent in each layer as a request is processed. This can be useful for debugging performance issues. Note that when retries or redirects happen, the time spent in each layer is cumulative, as noted by the `[layer]_count`. The metrics are stored in the `Request.context` dictionary, and can be accessed like `HTTP.get(...).request.context`.
113+
106114
### `basicauth`
107115

108116
By default, if "user info" is detected in the request url, like `http://user:password@host`, the `Authorization: Basic` header will be added to the request headers before the request is sent. While not very common, some APIs use this form of authentication to verify requests. This automatic adding of the header can be disabled by passing `basicauth=false`.
@@ -150,6 +158,16 @@ Controls the total number of retries that will be attempted. Can also disable al
150158

151159
By default, this keyword argument is `false`, which controls whether non-idempotent requests will be retried (POST or PATCH requests).
152160

161+
#### `retry_delays`
162+
163+
Allows providing a custom `ExponentialBackOff` object to control the delay between retries.
164+
Default is `ExponentialBackOff(n = retries)`.
165+
166+
#### `retry_check`
167+
168+
Allows providing a custom function to control whether a retry should be attempted.
169+
The function should accept 5 arguments: the delay state, exception, request, response (an `HTTP.Response` object *if* a request was successfully made, otherwise `nothing`), and `resp_body` response body (which may be `nothing` if there is no response yet, otherwise a `Vector{UInt8}`), and return `true` if a retry should be attempted. So in traditional nomenclature, the function would have the form `f(s, ex, req, resp, resp_body) -> Bool`.
170+
153171
### Redirect Arguments
154172

155173
#### `redirect`

docs/src/reference.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ HTTP.Messages.isidempotent
138138
HTTP.Messages.retryable
139139
HTTP.Messages.defaultheader!
140140
HTTP.Messages.readheaders
141-
HTTP.DefaultHeadersRequest.setuseragent!
141+
HTTP.HeadersRequest.setuseragent!
142142
HTTP.Messages.readchunksize
143143
HTTP.Messages.headerscomplete(::HTTP.Messages.Response)
144144
HTTP.Messages.writestartline
@@ -167,17 +167,13 @@ HTTP.poplayer!
167167
HTTP.popfirstlayer!
168168
HTTP.MessageRequest.messagelayer
169169
HTTP.RedirectRequest.redirectlayer
170-
HTTP.DefaultHeadersRequest.defaultheaderslayer
171-
HTTP.BasicAuthRequest.basicauthlayer
170+
HTTP.HeadersRequest.headerslayer
172171
HTTP.CookieRequest.cookielayer
173-
HTTP.CanonicalizeRequest.canonicalizelayer
174172
HTTP.TimeoutRequest.timeoutlayer
175173
HTTP.ExceptionRequest.exceptionlayer
176174
HTTP.RetryRequest.retrylayer
177175
HTTP.ConnectionRequest.connectionlayer
178-
HTTP.DebugRequest.debuglayer
179176
HTTP.StreamRequest.streamlayer
180-
HTTP.ContentTypeDetection.contenttypedetectionlayer
181177
```
182178

183179
### Raw Request Connection

0 commit comments

Comments
 (0)