Skip to content

Added ability to use LF, not only CRLF delimiter for response Headers and Body #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cdeler
Copy link
Contributor

@cdeler cdeler commented Nov 6, 2020

Hello,

I want to submit a PR which closes #7

Why the changes are required

According to this comment in the issue, there are problems with some old servers, which are not totally fit HTTP/1.1 RFC. The original issue in httpx (encode/httpx#1378) describes the situation, where some embedded system developers have to deal with a non-standard server

What has been done?

I tried to reimplement the function, which extracts headers for response, using regex

What hasn't done yet

Performance testing, fuzzing

(updates) How maybe_extract_lines works for now?

  1. first of all it extracts part of self._data buffer until the "\n\r?\n", which gives data
  2. then it determines the line delimiter in data using "\r?\n" regex, which gives delimiter
  3. then split lines using bytearray's .split(delimiter) (it much faster than regex.split(data))

With fix

6099.7 requests/sec
6172.2 requests/sec
6094.7 requests/sec
6202.1 requests/sec
6303.3 requests/sec
6358.0 requests/sec
6372.2 requests/sec

Without fix (522b004)

6065.5 requests/sec
6211.9 requests/sec
6211.1 requests/sec
6131.1 requests/sec
6333.1 requests/sec
6103.8 requests/sec
6260.0 requests/sec

@cdeler cdeler marked this pull request as draft November 6, 2020 19:25
@cdeler
Copy link
Contributor Author

cdeler commented Nov 7, 2020

Benchmark results

with fix

(h11) h11/bench % PYTHONPATH=.. python benchmarks/benchmarks.py
5877.3 requests/sec
5874.9 requests/sec
5529.5 requests/sec
5765.3 requests/sec
5903.8 requests/sec
5624.4 requests/sec
5861.4 requests/sec

without fix (latest master 522b004)

(h11) h11/bench % PYTHONPATH=.. python benchmarks/benchmarks.py
6161.7 requests/sec
6201.7 requests/sec
6268.1 requests/sec
6069.3 requests/sec
6105.4 requests/sec
6256.2 requests/sec
6294.8 requests/sec

@bdraco
Copy link

bdraco commented Nov 7, 2020

If performance is an issue, maybe look for "\r" or "\n" first.

If you see first "\r" use "\r\n" as the delimiter, otherwise use "\n".

@cdeler
Copy link
Contributor Author

cdeler commented Nov 8, 2020

@bdraco , yes, b"test-test-test".split(b"-") is much faster than regex::split(...)

@cdeler
Copy link
Contributor Author

cdeler commented Nov 8, 2020

After some performance investigations I changed the algorithm. For now the changeset looks file

How maybe_extract_lines works for now?

  1. first of all it extracts part of self._data buffer until the "\n\r?\n", which gives data
  2. then it determines the line delimiter in data using "\r?\n" regex, which gives delimiter
  3. then split lines using bytearray .split(delimiter) (it much faster than regex.split(data))

With fix

6099.7 requests/sec
6172.2 requests/sec
6094.7 requests/sec
6202.1 requests/sec
6303.3 requests/sec
6358.0 requests/sec
6372.2 requests/sec

Without fix (522b004)

6065.5 requests/sec
6211.9 requests/sec
6211.1 requests/sec
6131.1 requests/sec
6333.1 requests/sec
6103.8 requests/sec
6260.0 requests/sec

fuzzing results

Running
PYTHONPATH=.. py-afl-fuzz -o results -i afl-server-examples/ -- python ./afl-server.py
for 1 cycle, I didn't get any crashes/hangs

@mkanet

This comment has been minimized.

@bdraco

This comment has been minimized.

@mkanet

This comment has been minimized.

@bdraco

This comment has been minimized.

@cdeler
Copy link
Contributor Author

cdeler commented Nov 19, 2020

Hello @pgjones
I made the changes you mentioned.

Then I ran again the benchmark against the changes:

With changes

6290.6 requests/sec
6264.5 requests/sec
6302.6 requests/sec
6328.4 requests/sec
6344.2 requests/sec
6340.0 requests/sec
6347.9 requests/sec

Without changes

6479.7 requests/sec
6480.7 requests/sec
6502.3 requests/sec
6317.8 requests/sec
6458.3 requests/sec
6238.8 requests/sec
6278.7 requests/sec

@cdeler cdeler requested a review from pgjones November 19, 2020 18:59
Copy link
Member

@pgjones pgjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed another potential issue, I sadly missed last time.

if self._data[self._start : self._start + 2] == b"\r\n":
self._start += 2
start_chunk = self._data[self._start : self._start + 2]
if start_chunk in [b"\r\n", b"\n"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think start_chunk can only equal b"\n" if self._data == b"\n" (given the + 2 in 129). Maybe this needs to be,

if self._data[self._start : self._start + 2] == b"\r\n":
    self._start += 2
    return []
elif self._data[self._start : self._start + 1] == b"\n":
    self._start += 1
    return []

?

Copy link
Contributor Author

@cdeler cdeler Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed that, for example self._data might start from b"\nla-la-la-la".

Incorporated your changes

- changed blank_line_delimiter_regex
- changed maybe_extract_lines start processing
- added pytest param names to test_receivebuffer_for_invalid_delimiter
@cdeler cdeler force-pushed the fix-problems-with-wrong-line-delimiters branch from aa87022 to 489de4d Compare November 20, 2020 11:12
@cdeler cdeler requested a review from pgjones November 20, 2020 11:18
@wonderiuy
Copy link

You can use the address and credentials i've given to you, firmware on that device is 5.20.5

@Kane610
Copy link

Kane610 commented Dec 1, 2020

@wonderiuy I meant getting his changes to my local filesystem :)

@Kane610
Copy link

Kane610 commented Dec 1, 2020

It was really simple with GitHub desktop to clone this branch and try it out. Works well on both 5.51 and 5.75 as well as on newer firmwares. Good job @cdeler !

@donnib
Copy link

donnib commented Dec 2, 2020

I can also confirm this is working locally ;) Looking forward for the PR to be approved.

@hoorna
Copy link

hoorna commented Dec 2, 2020

Last week Axis published a new firmware version (5.51.7.2) for my camera (M1034-W). Below you will find the release notes.

Don't know if it has something to do with the issue (LF vs CRLF) in this thread but the first correction (C01) in the new firmware version is handling about "Corrected a newline character".

I thought it would be wise to make a notice about it in this tread.


Corrections in 5.51.7.2 since 5.51.7.1

=======================================

5.51.7.2:C01
Corrected a newline character in pwdgrp.cgi, introduced in 5.51.6, that could cause
problems when parsing the response.

5.51.7.2:C02
Corrected an issue that prevented Action Rule Events from sending images via email.

5.51.7.2:C03
Corrected an issue that caused monolith to timeout and respawn during too many
connect/disconnect RTSP streaming requests.

5.51.7.2:C04
Added support to enable/disable X-Frame-Options headers in the plainconfig. By default,
X-Frame-Options is enabled and its value is set to "sameorigin".

@Kane610
Copy link

Kane610 commented Dec 2, 2020

Last week Axis published a new firmware version (5.51.7.2) for my camera (M1034-W). Below you will find the release notes.

Don't know if it has something to do with the issue (LF vs CRLF) in this thread but the first correction (C01) in the new firmware version is handling about "Corrected a newline character".

Thanks! I don't know, but regardless there are other firmwares that don't get this update so it is still needed.

@wonderiuy
Copy link

I've checked with the new firmware 5.51.7.2 but the problem is still there, so the fix from @cdeler and @Kane610 is more than welcome :)

@astrandb
Copy link

astrandb commented Dec 6, 2020

Can also confirm that this PR solves the problems with older versions of Axis cameras in Homeassistant.

@Kane610
Copy link

Kane610 commented Dec 8, 2020

Hey guys! Any progress on getting this merged?


# Only search in buffer space that we've not already looked at.
partial_buffer = self._data[self._multiple_lines_search :]
match = blank_line_regex.search(partial_buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying the buffer, use the pos argument to search:

match = blank_line_regex.search(self._data, self._multiple_lines_search)

partial_buffer = self._data[search_start_index:]
partial_idx = partial_buffer.find(b"\r\n")
if partial_idx == -1:
self._next_line_search = len(self._data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In maybe_extract_next_line, we store the raw buffer length in self._next_line_search and then do the subtraction when we use it. In maybe_extract_lines, we store the "pre-subtracted" value, so we can use it directly. This inconsistency is kind of confusing :-). We should switch one of them so they match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as I rewrote both methods using _extract(...) method, this issue resolved (both methods works with offsets)


self._data[:count] = b""
self._next_line_search = 0
self._multiple_lines_search = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of copy/pastes of this code for extracting an initial slice and then doing internal bookkeeping, which is error prone. Let's factor it out into a method, like:

def _extract(self, count):
    out = self._data[:count]
    del self._data[:count]
    self._next_line_search = 0
    self._multiple_lines_search = 0

And then in all the other methods, just do:

    return self._extract(whatever_length_value_we_ended_up_with)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced the _extract, and replaced all copy/pastes using this function.
It also helped to resolve the above comment (#115 (comment))

b"Content-type: text/plain",
b"Connection: close",
]
assert bytes(b) == b"Some body"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a few similar tests to test_readers_unusual in test_io.py? These tests are good for the basic ReceiveBuffer functionality, but the test harness there sets up a more "end-to-end" setup that runs our full http parsing pipeline, so it would give us more confidence that everything is wired up correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added similar tests to test_readers_unusual

# Truncate the buffer and return it.
idx = self._multiple_lines_search + match.span(0)[-1]
out = self._data[:idx]
lines = [line.rstrip(b"\r") for line in out.split(b"\n")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling rstrip here (which always has to copy the whole buffer, since these are bytearrays), I think we could leave the trailing \r in, and then in _abnf.py change header_field, request_line, and status_line to match a trailing optional \r, e.g.:

header_field = (
    r"(?P<field_name>{field_name})"
    r":"
    r"{OWS}"
    r"(?P<field_value>{field_value})"
    r"{OWS}\r?".format(**globals())         # <-- notice added \r? here
)

Copy link
Contributor Author

@cdeler cdeler Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that.

I added \r? to these regexes (header_field, and request_line, and status_line)

Then I set

lines = out.split(b"\n")

but it broke one of test_readers_unusual test cases in test_io.py

the test source
def test_readers_unusual():
    ...

    # obsolete line folding
    tr(
        READERS[CLIENT, IDLE],
        b"HEAD /foo HTTP/1.1\r\n"
        b"Host: example.com\r\n"
        b"Some: multi-line\r\n"
        b" header\r\n"
        b"\tnonsense\r\n"
        b"    \t   \t\tI guess\r\n"
        b"Connection: close\r\n"
        b"More-nonsense: in the\r\n"
        b"    last header  \r\n\r\n",
        Request(
            method="HEAD",
            target="/foo",
            headers=[
                ("Host", "example.com"),
                ("Some", "multi-line header nonsense I guess"),
                ("Connection", "close"),
                ("More-nonsense", "in the last header"),
            ],
        ),
    )

The header

        b"Some: multi-line\r\n"
        b" header\r\n"
        b"\tnonsense\r\n"
        b"    \t   \t\tI guess\r\n"

turns to

        b"Some: multi-line\r header\r\tnonsense\r    \t   \t\tI guess\r\n"

I cannot figure out how to carefully cut out the \r from such a line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion in gitter, it has been decided not to change regexes, but to rewrite .rstrip(...) with

        for line in lines:
            if line.endswith(b"\r"):
                del line[-1]

to avoid extra memory allocation

@njsmith
Copy link
Member

njsmith commented Dec 10, 2020

Oh also, forgot to say: could you also add a news entry for the next release notes? See newsfragments/README.rst for the details on how to do that.

cdeler added a commit to cdeler/h11 that referenced this pull request Dec 11, 2020
1. added new tests to test_io.py
2. introduced ReceiveBuffer::_extract
3. added a newsfragment
@cdeler
Copy link
Contributor Author

cdeler commented Dec 11, 2020

Hello @njsmith ,

I resolved all remarks apart from this.

I cannot understand how to clearly and carefully fix the broken test case (probably we cannot just remove it).

1. added new tests to test_io.py
2. introduced ReceiveBuffer::_extract
3. added a newsfragment
@cdeler cdeler force-pushed the fix-problems-with-wrong-line-delimiters branch from f1c4157 to f688615 Compare December 11, 2020 16:22
@cdeler cdeler requested a review from njsmith December 11, 2020 16:50
cdeler and others added 2 commits December 14, 2020 21:12
Replaced lines.rstrip(...) with `del line[-1]` to avoid extra allocations
@njsmith
Copy link
Member

njsmith commented Dec 21, 2020

@cdeler I tweaked your newsfragment to use the correct quoting: in ReST, code literals require double backticks. Super annoying if you're used to markdown, but what can ya do.

@pgjones Note that this PR doesn't quite drop support for py2, but it does change the buffer handling to be O(n**2) on py2, and I'm wondering if we should flag that in the release notes or anything. Or are you planning to drop py2 for real in the next release anyway?

@pgjones
Copy link
Member

pgjones commented Dec 26, 2020

Lets merge this, and drop Py2 for the next release.

@pgjones pgjones merged commit c88da54 into python-hyper:master Dec 26, 2020
pgjones pushed a commit that referenced this pull request Dec 26, 2020
1. it uses b"\n\r?\n" as a blank line delimiter regex
2. it splits lines using b"\r?\n" regex, so that it's tolerant for mixed line endings
3. for chunked encoding it rewind buffer until b"\r\n"

The changes are based on this comment: #115 (comment)
pgjones pushed a commit that referenced this pull request Dec 26, 2020
pgjones pushed a commit that referenced this pull request Dec 26, 2020
pgjones pushed a commit that referenced this pull request Dec 26, 2020
1. added new tests to test_io.py
2. introduced ReceiveBuffer::_extract
3. added a newsfragment
@wonderiuy
Copy link

This is like a Christmas gift, a big thank you to every1 involved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for servers with broken line endings