Skip to content

discussion: what does mitmproxy need? #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
njsmith opened this issue Jun 21, 2017 · 4 comments
Closed

discussion: what does mitmproxy need? #47

njsmith opened this issue Jun 21, 2017 · 4 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jun 21, 2017

I've been talking with @mhils a bit about this on IRC, and there's some discussion in https://github.com/njsmith/h11/issues/31#issuecomment-309774081, but I kind of want to see mitmproxy's requirements all written down in one place. Let's use this issue for that.

@mhils says:

The main pain points for us are usually along these lines:

  1. Capturing/send data exactly as-is (we want to mirror the client accurately, e.g. header capitalization)
  2. Tolerance for slightly misbehaving clients (of course, we reject absurdly wrong data)

I think headers are the only blocking issue for us right now. h11's request/response validation is considerably stricter than what we do right now, but I'm slightly optimistic that we could at least patch this effortlessly.

Here are some places off the top of my head where h11 currently will not do byte-for-byte pass-through:

  • Header capitalization, as noted
  • h11 understands HTTP/1.0, but it never emits it
  • But h11 doesn't understand the non-standard keep-alive extensions sometimes used with HTTP/1.0. Which is fine when it's implementing one side of a connection because it can make sure that it never gets used. But if you're trying to eavesdrop on connections between an arbitrary client/server, it's possible that they'll decide to use it and h11 will get confused.
  • Also, h11 will unconditionally insert or rewrite headers in some cases (in particular Transfer-Encoding and Connection)
  • h11 will eventually tolerate non-compliant line-endings (Support for servers with broken line endings #7), but won't emit them
  • Leading and trailing whitespace in header values is always discarded
  • We understand headers split over multiple continuation lines, but don't emit them
  • We tolerate missing reason phrases (Tolerate missing reason phrases #32), but don't emit them
  • Chunked encoding boundaries: Currently we do expose where the boundaries fall because @Lukasa needed this for urllib3 (Proposal: make it possible to observe chunk delimiters. #19), but I think urllib3 has come to their senses and decided they don't need this after all, so I was kinda hoping we could get rid of it again :-). Also, the way we expose them is sub-optimal if your goal is to regenerate them, because we don't expose the length; you just have to buffer the whole chunk and then re-emit it all at once, which is unnecessary unbounded memory overhead.
  • Speaking of chunked encoding, we also discard chunk extension metadata and provide no way to emit it.

There are probably a few other issues like this that I'm not thinking of right now.

Also, as noted, we're pretty strict about forbidding a lot of things that a pentester might reasonably want to emit. (In fact, "would a pentester find this useful" is one of the criteria that we use to decide what to forbid, because generally you don't want your software to be accidentally turned into a pentester :-).) Some kinds of validation could be disabled without too much work (see e.g. #33), but other kinds are more difficult. For example, h11 absolutely won't tolerate incoming data with whitespace around header names (Host : example.com) or illegal characters in header values – both of which are classic sources of security bugs ­– and this is enforced implicitly by the regexes used to parse the header name and value in the first place, rather than being some extra code that could be switched on or off.

I guess if you want to monkeypatch that's your business, but if you go that route then I feel like I should give you the standard warning just as a matter of practicality I don't think we can guarantee that your monkeypatches will keep working in future releases, or that the hooks you're trying to monkeypatch will even still exist. API guarantees only apply to APIs :-).

So... if you need h11 to be able to handle all of the pass-through cases listed above, then realistically I don't see how to implement that without the codebase turning into something monstrous and full of bugs. From discussions so far I'm guessing you don't actually care about all of these, but currently I don't understand the criteria that determine which ones you care about and which ones you don't, so I can't guess which ones are important. (In particular, it seems like HTTP/1.0 stuff might matter in practice?) Can you elaborate?

@mhils
Copy link
Member

mhils commented Jun 21, 2017

Thanks @njsmith!

To add to the list:

In general, failing for a malformed client request is okay for us. The scenario we definitely want to avoid is having a connection where the behaviour is different if mitmproxy is in between or not. From the list above, our definite must-have requirements are probably header capitalization and order, and emitting HTTP/1.0 messages. It is worth noting that we also have the separate pathod/pathoc project for more creative forms of pentesting/fuzzing.

I see three viable ways forward:

  1. Only re-use the most basic bits (i.e. ReceiveBuffer). See also njsmith/sansio_toolbelt.
  2. Depend on a pinned version of h11 where we monkeypatch some of the normalizations done by h11 (ugh).
  3. Fix the biggest issues for us in h11 and use it as-is.

I'd generally prefer approach 3 - that'd be fantastic real-word validation for h11 and we'd have less code to maintain. However, I'd also understand if @cortesi has a considerably stronger preference for mirroring connections as-is. FWIW, the list above includes quite a few edge cases we don't support currently either (non-standard keepalive was never a problem so far, chunk extension metadata isn't implemented, we also strip whitespace around headers).

/cc @cortesi @Kriechi: thoughts?

@Kriechi
Copy link
Member

Kriechi commented Jun 21, 2017

Sounds all good!
The biggest issue I see is that there are soooo many weirdly broken HTTP/1.x clients and servers out there. Trying to accommodate for all edge cases and quirks will be time consuming, I suppose?

Or what everybody's general feeling regarding strict RFC-compliance vs. being liberal in what is accepted? Maybe I'm overthinking this, and in reality it is no big issue...

@cortesi
Copy link

cortesi commented Jun 26, 2017

Chaps, I'm open to being convinced, but at the moment i'm -1. Mitmproxy has two different constituencies. The first is app/web testers, who are completely fine with data not being relayed strictly accurately between the client and server, as long as gross behaviour doesn't change. The second are people who use mitmproxy as a forensic tool, and in other work where strict fidelity is crucial. When I was consulting, that was about half of the stuff I did. In this context, it's super important that what enters mitmproxy on one end is to the greatest extend possible identical to what exits it on the other.

Because of this second use case, mitmproxy places demands on its protocol stack that aren't reasonable, and arguably makes the stack worse for conventional use. Things that are user conveniences or useful protections against errors in a user-focused protocol are fatal constraints for us. Even if we fixed some of the issues listed by @njsmith now, we can't expect the type of protocol fidelity we need to be a project goal for h11 in the longer term.

@njsmith
Copy link
Member Author

njsmith commented Oct 30, 2018

This seems to have petered out, so closing... we can re-open if the discussion starts back up.

@njsmith njsmith closed this as completed Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants