-
Notifications
You must be signed in to change notification settings - Fork 66
Fails to decode a header that requests can handle #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep, RFC 7230 says unambiguously that you can't have characters below Now the question is... what to do about it. If they're violating the spec like that and getting away with it, then probably others are as well, so we might need to relax h11 to match. I'm not keen on allowing just any character inside headers though; that seems like it will inevitably create security issues. I guess we have to go figure out what characters browsers and curl allow... |
Things I've learned so far:
Ugh. I definitely do not want to allow NUL through by default, even if that's what other clients do. Also, on input I guess Some options:
@Lukasa Would appreciate your thoughts here if you have time... e.g. does h2 validate header values? @kennethreitz @haikuginger @sigmavirus24 You might also have thoughts here, since this might end up affecting urllib3/requests. |
BTW, note that I just moved this repo into the |
For future reference, the program I used to check how curl and firefox handle NUL in header values: import trio
async def silly_server(stream):
data = bytearray()
while True:
data += await stream.receive_some(4096)
if b"\r\n\r\n" in data:
break
await stream.send_all(
b"HTTP/1.1 200 OK\r\nTest: \x00\r\nContent-Length: 2\r\n\r\nXX"
)
async def main():
await trio.serve_tcp(silly_server, 8888)
trio.run(main) |
My main opinion is that requests should never have started using environment variables to control things in the stack, especially around security sensitive configuration. An alternative would be to return |
h2 does indeed validate header fields, but its validation is quite cautious. In its case it does allow all the characters you’re mentioning, which could well be considered a bug. |
@sigmavirus24 I guess a similar option would be to have some sort of configuration you pass when setting up the That said... I guess we always want strict validation on outgoing headers, and that probably we always want strict validation as servers parsing incoming requests (because on the server side, you can give a proper error message, plus not-always-but-usually the client will be stuck working around whatever you do rather than vice-versa). And if this is breaking in real life and no extant clients actually enforce it, then I guess urllib3 and similar will probably want to disable it always anyway. So one heuristic would be: when parsing the headers from an incoming response, and only in this situation, then outlaw |
Not if the mitmproxy folks start using this library ;)
I understand that they're an easy way for end users to do a thing that maybe fixes it. But that always ends up being a hammer to them that they then use to bash every problem with, unnecessarily. I regret every envvar that Requests supports, even the one that controls certificate files, and that one is arguably the most/only useful one. No amount of documentation will ever fix a user's preconceived notions about what "wins" and precedence gets tricky with enough different sources of configuration. |
@sigmavirus24 should we remove env vars for 3.0? |
@kennethreitz I'm not sure the fury would be worth making our lives slightly easier. I think we've already merged changes to make the decision/precedence process considerably more consistent. It's just that I think there shouldn't ever be another one added :) |
On further thought, I realized my suggestion above wouldn't actually handle the case that started this, because there the offending byte is inside a cookie, which means that the client has to be able to send it back to the server :-/. So I guess our options are:
So I guess I'm now leaning towards the first option. |
I am noticing this issue on a surprising number of sites, so I googled. It would seem that |
That's, umm. Wow. |
The RFC says we should reject any header value that contains control characters. But apparently in the real world, you have to both accept and produce these sometimes (e.g. Google Analytics cookies use them). As a compromise, we now accept most control characters, but continue to disallow NUL (\x00) and all whitespace (\t\n\r\f\v and space), except that space and tab are allowed inside header values when surrounded by non-whitespace characters. Closes: python-hypergh-57, python-hypergh-58
The RFC says we should reject any header value that contains control characters. But apparently in the real world, you have to both accept and produce these sometimes (e.g. Google Analytics cookies use them). As a compromise, we now accept most control characters, but continue to disallow NUL (\x00) and all whitespace (\t\n\r\f\v and space), except that space and tab are allowed inside header values when surrounded by non-whitespace characters. Closes: python-hypergh-57, python-hypergh-58
Ok, #68 implements the hack I suggested above, except that it makes the list of forbidden characters be NUL + all whitespace (including the weird ones like vertical-tab). Hopefully that's loose enough to work in the real world while still being strict enough to provide some defense-in-depth. Review would appreciated if anyone has time. |
Note for future reference: the WHATWG fetch spec defines an HTTP header value to be:
So they're slightly looser than h11's new rules: h11 and fetch both disallow NUL, |
Note that curl allows all sorts of things specifically because it is a tool used for pentesting and verification. It would be nice if it had some sort of validation mode that highlighted spec errors. |
I am trying to read this url
https://www.bitstamp.net/api/v2/trading-pairs-info/
It fails (see theelous3/asks#60) with an exception:
Looking closer, this is the header that seems to cause trouble:
bytearray(b'Set-Cookie: ___utmvafIumyLc=kUd\x01UpAt; path=/; Max-Age=900')
My guess is it's the
\x
.The text was updated successfully, but these errors were encountered: