Added ability to use LF, not only CRLF delimiter for response Headers and Body #115

cdeler · 2020-11-06T19:24:58Z

Hello,

I want to submit a PR which closes #7

Why the changes are required

According to this comment in the issue, there are problems with some old servers, which are not totally fit HTTP/1.1 RFC. The original issue in httpx (encode/httpx#1378) describes the situation, where some embedded system developers have to deal with a non-standard server

What has been done?

I tried to reimplement the function, which extracts headers for response, using regex

What hasn't done yet

~~Performance testing, fuzzing~~

(updates) How `maybe_extract_lines` works for now?

first of all it extracts part of self._data buffer until the "\n\r?\n", which gives data
then it determines the line delimiter in data using "\r?\n" regex, which gives delimiter
then split lines using bytearray's .split(delimiter) (it much faster than regex.split(data))

With fix

6099.7 requests/sec
6172.2 requests/sec
6094.7 requests/sec
6202.1 requests/sec
6303.3 requests/sec
6358.0 requests/sec
6372.2 requests/sec

Without fix (`522b004`)

6065.5 requests/sec
6211.9 requests/sec
6211.1 requests/sec
6131.1 requests/sec
6333.1 requests/sec
6103.8 requests/sec
6260.0 requests/sec

cdeler · 2020-11-07T17:44:16Z

Benchmark results

with fix

(h11) h11/bench % PYTHONPATH=.. python benchmarks/benchmarks.py
5877.3 requests/sec
5874.9 requests/sec
5529.5 requests/sec
5765.3 requests/sec
5903.8 requests/sec
5624.4 requests/sec
5861.4 requests/sec

without fix (latest master `522b004`)

(h11) h11/bench % PYTHONPATH=.. python benchmarks/benchmarks.py
6161.7 requests/sec
6201.7 requests/sec
6268.1 requests/sec
6069.3 requests/sec
6105.4 requests/sec
6256.2 requests/sec
6294.8 requests/sec

bdraco · 2020-11-07T21:39:15Z

If performance is an issue, maybe look for "\r" or "\n" first.

If you see first "\r" use "\r\n" as the delimiter, otherwise use "\n".

cdeler · 2020-11-08T08:12:02Z

@bdraco , yes, b"test-test-test".split(b"-") is much faster than regex::split(...)

cdeler · 2020-11-08T18:16:07Z

After some performance investigations I changed the algorithm. For now the changeset looks file

How `maybe_extract_lines` works for now?

first of all it extracts part of self._data buffer until the "\n\r?\n", which gives data
then it determines the line delimiter in data using "\r?\n" regex, which gives delimiter
then split lines using bytearray .split(delimiter) (it much faster than regex.split(data))

With fix

6099.7 requests/sec
6172.2 requests/sec
6094.7 requests/sec
6202.1 requests/sec
6303.3 requests/sec
6358.0 requests/sec
6372.2 requests/sec

Without fix (`522b004`)

6065.5 requests/sec
6211.9 requests/sec
6211.1 requests/sec
6131.1 requests/sec
6333.1 requests/sec
6103.8 requests/sec
6260.0 requests/sec

fuzzing results

Running
PYTHONPATH=.. py-afl-fuzz -o results -i afl-server-examples/ -- python ./afl-server.py
for 1 cycle, I didn't get any crashes/hangs

h11/_receivebuffer.py

- reword comment

- reworked maybe_extract_until_next

…imiter_regex) and slightly updated docs

cdeler · 2020-11-19T18:59:43Z

Hello @pgjones
I made the changes you mentioned.

Then I ran again the benchmark against the changes:

With changes

6290.6 requests/sec
6264.5 requests/sec
6302.6 requests/sec
6328.4 requests/sec
6344.2 requests/sec
6340.0 requests/sec
6347.9 requests/sec

Without changes

6479.7 requests/sec
6480.7 requests/sec
6502.3 requests/sec
6317.8 requests/sec
6458.3 requests/sec
6238.8 requests/sec
6278.7 requests/sec

pgjones

Noticed another potential issue, I sadly missed last time.

pgjones · 2020-11-19T20:30:03Z

h11/_receivebuffer.py

-        if self._data[self._start : self._start + 2] == b"\r\n":
-            self._start += 2
+        start_chunk = self._data[self._start : self._start + 2]
+        if start_chunk in [b"\r\n", b"\n"]:


I think start_chunk can only equal b"\n" if self._data == b"\n" (given the + 2 in 129). Maybe this needs to be,

if self._data[self._start : self._start + 2] == b"\r\n": self._start += 2 return [] elif self._data[self._start : self._start + 1] == b"\n": self._start += 1 return []

?

You are right, I missed that, for example self._data might start from b"\nla-la-la-la".

Incorporated your changes

h11/_receivebuffer.py

- changed blank_line_delimiter_regex - changed maybe_extract_lines start processing

- added pytest param names to test_receivebuffer_for_invalid_delimiter

wonderiuy · 2020-12-01T18:16:14Z

You can use the address and credentials i've given to you, firmware on that device is 5.20.5

Kane610 · 2020-12-01T18:25:58Z

@wonderiuy I meant getting his changes to my local filesystem :)

Kane610 · 2020-12-01T20:25:17Z

It was really simple with GitHub desktop to clone this branch and try it out. Works well on both 5.51 and 5.75 as well as on newer firmwares. Good job @cdeler !

donnib · 2020-12-02T07:59:23Z

I can also confirm this is working locally ;) Looking forward for the PR to be approved.

hoorna · 2020-12-02T09:32:52Z

Last week Axis published a new firmware version (5.51.7.2) for my camera (M1034-W). Below you will find the release notes.

Don't know if it has something to do with the issue (LF vs CRLF) in this thread but the first correction (C01) in the new firmware version is handling about "Corrected a newline character".

I thought it would be wise to make a notice about it in this tread.

Corrections in 5.51.7.2 since 5.51.7.1

=======================================

5.51.7.2:C01
Corrected a newline character in pwdgrp.cgi, introduced in 5.51.6, that could cause
problems when parsing the response.

5.51.7.2:C02
Corrected an issue that prevented Action Rule Events from sending images via email.

5.51.7.2:C03
Corrected an issue that caused monolith to timeout and respawn during too many
connect/disconnect RTSP streaming requests.

5.51.7.2:C04
Added support to enable/disable X-Frame-Options headers in the plainconfig. By default,
X-Frame-Options is enabled and its value is set to "sameorigin".

Kane610 · 2020-12-02T09:44:21Z

Last week Axis published a new firmware version (5.51.7.2) for my camera (M1034-W). Below you will find the release notes.

Don't know if it has something to do with the issue (LF vs CRLF) in this thread but the first correction (C01) in the new firmware version is handling about "Corrected a newline character".

Thanks! I don't know, but regardless there are other firmwares that don't get this update so it is still needed.

wonderiuy · 2020-12-02T10:08:32Z

I've checked with the new firmware 5.51.7.2 but the problem is still there, so the fix from @cdeler and @Kane610 is more than welcome :)

astrandb · 2020-12-06T09:04:18Z

Can also confirm that this PR solves the problems with older versions of Axis cameras in Homeassistant.

Kane610 · 2020-12-08T08:22:19Z

Hey guys! Any progress on getting this merged?

njsmith · 2020-12-10T11:13:12Z

h11/_receivebuffer.py

+
+        # Only search in buffer space that we've not already looked at.
+        partial_buffer = self._data[self._multiple_lines_search :]
+        match = blank_line_regex.search(partial_buffer)


Instead of copying the buffer, use the pos argument to search:

match = blank_line_regex.search(self._data, self._multiple_lines_search)

njsmith · 2020-12-10T11:16:47Z

h11/_receivebuffer.py

+        partial_buffer = self._data[search_start_index:]
+        partial_idx = partial_buffer.find(b"\r\n")
+        if partial_idx == -1:
+            self._next_line_search = len(self._data)


In maybe_extract_next_line, we store the raw buffer length in self._next_line_search and then do the subtraction when we use it. In maybe_extract_lines, we store the "pre-subtracted" value, so we can use it directly. This inconsistency is kind of confusing :-). We should switch one of them so they match.

As soon as I rewrote both methods using _extract(...) method, this issue resolved (both methods works with offsets)

njsmith · 2020-12-10T11:19:24Z

h11/_receivebuffer.py

+
+        self._data[:count] = b""
+        self._next_line_search = 0
+        self._multiple_lines_search = 0


There's a lot of copy/pastes of this code for extracting an initial slice and then doing internal bookkeeping, which is error prone. Let's factor it out into a method, like:

def _extract(self, count): out = self._data[:count] del self._data[:count] self._next_line_search = 0 self._multiple_lines_search = 0

And then in all the other methods, just do:

return self._extract(whatever_length_value_we_ended_up_with)

I introduced the _extract, and replaced all copy/pastes using this function.
It also helped to resolve the above comment (#115 (comment))

njsmith · 2020-12-10T11:27:36Z

h11/tests/test_receivebuffer.py

+        b"Content-type: text/plain",
+        b"Connection: close",
+    ]
+    assert bytes(b) == b"Some body"


Can you also add a few similar tests to test_readers_unusual in test_io.py? These tests are good for the basic ReceiveBuffer functionality, but the test harness there sets up a more "end-to-end" setup that runs our full http parsing pipeline, so it would give us more confidence that everything is wired up correctly.

I added similar tests to test_readers_unusual

njsmith · 2020-12-10T11:34:06Z

h11/_receivebuffer.py

+        # Truncate the buffer and return it.
+        idx = self._multiple_lines_search + match.span(0)[-1]
+        out = self._data[:idx]
+        lines = [line.rstrip(b"\r") for line in out.split(b"\n")]


Instead of calling rstrip here (which always has to copy the whole buffer, since these are bytearrays), I think we could leave the trailing \r in, and then in _abnf.py change header_field, request_line, and status_line to match a trailing optional \r, e.g.:

header_field = ( r"(?P<field_name>{field_name})" r":" r"{OWS}" r"(?P<field_value>{field_value})" r"{OWS}\r?".format(**globals()) # <-- notice added \r? here )

I tried to do that.

I added \r? to these regexes (header_field, and request_line, and status_line)

Then I set

lines = out.split(b"\n")

but it broke one of test_readers_unusual test cases in test_io.py

the test source

def test_readers_unusual(): ... # obsolete line folding tr( READERS[CLIENT, IDLE], b"HEAD /foo HTTP/1.1\r\n" b"Host: example.com\r\n" b"Some: multi-line\r\n" b" header\r\n" b"\tnonsense\r\n" b" \t \t\tI guess\r\n" b"Connection: close\r\n" b"More-nonsense: in the\r\n" b" last header \r\n\r\n", Request( method="HEAD", target="/foo", headers=[ ("Host", "example.com"), ("Some", "multi-line header nonsense I guess"), ("Connection", "close"), ("More-nonsense", "in the last header"), ], ), )

The header

b"Some: multi-line\r\n" b" header\r\n" b"\tnonsense\r\n" b" \t \t\tI guess\r\n"

turns to

b"Some: multi-line\r header\r\tnonsense\r \t \t\tI guess\r\n"

I cannot figure out how to carefully cut out the \r from such a line.

After some discussion in gitter, it has been decided not to change regexes, but to rewrite .rstrip(...) with

for line in lines: if line.endswith(b"\r"): del line[-1]

to avoid extra memory allocation

njsmith · 2020-12-10T11:35:48Z

Oh also, forgot to say: could you also add a news entry for the next release notes? See newsfragments/README.rst for the details on how to do that.

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

cdeler · 2020-12-11T16:13:12Z

Hello @njsmith ,

I resolved all remarks apart from this.

I cannot understand how to clearly and carefully fix the broken test case (probably we cannot just remove it).

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

Replaced lines.rstrip(...) with `del line[-1]` to avoid extra allocations

njsmith · 2020-12-21T23:57:33Z

@cdeler I tweaked your newsfragment to use the correct quoting: in ReST, code literals require double backticks. Super annoying if you're used to markdown, but what can ya do.

@pgjones Note that this PR doesn't quite drop support for py2, but it does change the buffer handling to be O(n**2) on py2, and I'm wondering if we should flag that in the release notes or anything. Or are you planning to drop py2 for real in the next release anyway?

pgjones · 2020-12-26T17:46:19Z

Lets merge this, and drop Py2 for the next release.

1. it uses b"\n\r?\n" as a blank line delimiter regex 2. it splits lines using b"\r?\n" regex, so that it's tolerant for mixed line endings 3. for chunked encoding it rewind buffer until b"\r\n" The changes are based on this comment: #115 (comment)

using these test results #115 (comment)

@tomchristie

after @tomchristie's proposal from #115 (comment)

ref #115 (comment)

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

wonderiuy · 2020-12-26T22:53:41Z

This is like a Christmas gift, a big thank you to every1 involved

Added ability to use LF, not only CRLF delimiter

fda65a2

cdeler marked this pull request as draft November 6, 2020 19:25

bdraco mentioned this pull request Nov 7, 2020

Rest sensor KO since 0.117 home-assistant/core#42589

Closed

Fixed some performance issues

d3b2ef2

cdeler marked this pull request as ready for review November 8, 2020 19:17

cdeler mentioned this pull request Nov 10, 2020

Support for servers with broken line endings #7

Closed

mario-tux mentioned this pull request Nov 11, 2020

REST sensors show as "Unavailable" state after upgrading to Home Assistant 0.117.0 home-assistant/core#42608

Closed

This comment has been minimized.

Sign in to view

This was referenced Nov 13, 2020

REST giving problems after updates >version 116.2 home-assistant/core#43152

Closed

Axis-41 not working anymore since 0.117.0b0 / httpx home-assistant/core#42415

Closed

pgjones requested changes Nov 18, 2020

View reviewed changes

h11/_receivebuffer.py Outdated Show resolved Hide resolved

h11/_receivebuffer.py Outdated Show resolved Hide resolved

bluetech mentioned this pull request Nov 18, 2020

Drop support for Python 2 #116

Closed

cdeler added 4 commits November 19, 2020 11:47

Fixing PR remarks

b2f70e9

- reword comment

Fixed PR remark

ce26dc7

- reworked maybe_extract_until_next

Small rfg (renamed body_and_headers_delimiter_regex -> blank_line_del…

af7b481

…imiter_regex) and slightly updated docs

Fix CI remarks (run isort and black)

341f632

cdeler requested a review from pgjones November 19, 2020 18:59

pgjones requested changes Nov 19, 2020

View reviewed changes

cdeler added 2 commits November 20, 2020 14:02

Fixed PR remarks

74693fc

- changed blank_line_delimiter_regex - changed maybe_extract_lines start processing

Small rfg for tests

489de4d

- added pytest param names to test_receivebuffer_for_invalid_delimiter

cdeler force-pushed the fix-problems-with-wrong-line-delimiters branch from aa87022 to 489de4d Compare November 20, 2020 11:12

cdeler requested a review from pgjones November 20, 2020 11:18

njsmith reviewed Dec 10, 2020

View reviewed changes

cdeler added a commit to cdeler/h11 that referenced this pull request Dec 11, 2020

Fixed PR remarks from python-hyper#115 (comment)

f1c4157

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

Fixed PR remarks from python-hyper#115 (comment)

f688615

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

cdeler force-pushed the fix-problems-with-wrong-line-delimiters branch from f1c4157 to f688615 Compare December 11, 2020 16:22

cdeler requested a review from njsmith December 11, 2020 16:50

cdeler and others added 2 commits December 14, 2020 21:12

Fixed PR remarks

ca534f7

Replaced lines.rstrip(...) with `del line[-1]` to avoid extra allocations

Fix ReST formatting

a1197d6

pgjones merged commit c88da54 into python-hyper:master Dec 26, 2020

pgjones pushed a commit that referenced this pull request Dec 26, 2020

Speed up maybe_extract_lines and removed unused variables

a23ecc6

using these test results #115 (comment)

pgjones pushed a commit that referenced this pull request Dec 26, 2020

Changed the ReceiveBuffer

7efb98b

after @tomchristie's proposal from #115 (comment)

pgjones pushed a commit that referenced this pull request Dec 26, 2020

Tuned maybe_extract_next_line to search only \r\n

15947cd

ref #115 (comment)

pgjones pushed a commit that referenced this pull request Dec 26, 2020

Fixed PR remarks from #115 (comment)

c80c416

1. added new tests to test_io.py 2. introduced ReceiveBuffer::_extract 3. added a newsfragment

Kane610 mentioned this pull request Jan 1, 2021

Bump H11 library to support non RFC line endings home-assistant/core#44735

Merged

21 tasks

cdeler mentioned this pull request Jan 1, 2021

httpx cannot process http/1.1 responses with headers and body delimited by \n\n or \n\n\n\n (but requests can) encode/httpx#1378

Closed

2 tasks

enjoysimpson mentioned this pull request Jan 11, 2021

REST platform stopped working home-assistant/core#43576

Closed

Added ability to use LF, not only CRLF delimiter for response Headers and Body #115

Added ability to use LF, not only CRLF delimiter for response Headers and Body #115

Conversation

cdeler commented Nov 6, 2020 • edited Loading

Why the changes are required

What has been done?

What hasn't done yet

(updates) How maybe_extract_lines works for now?

With fix

Without fix (522b004)

cdeler commented Nov 7, 2020

Benchmark results

with fix

without fix (latest master 522b004)

bdraco commented Nov 7, 2020

cdeler commented Nov 8, 2020

cdeler commented Nov 8, 2020 • edited Loading

How maybe_extract_lines works for now?

With fix

Without fix (522b004)

fuzzing results

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

cdeler commented Nov 19, 2020

With changes

Without changes

pgjones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeler Nov 20, 2020 • edited Loading

Choose a reason for hiding this comment

wonderiuy commented Dec 1, 2020

Kane610 commented Dec 1, 2020

Kane610 commented Dec 1, 2020 • edited Loading

donnib commented Dec 2, 2020

hoorna commented Dec 2, 2020 • edited Loading

Kane610 commented Dec 2, 2020

wonderiuy commented Dec 2, 2020

astrandb commented Dec 6, 2020

Kane610 commented Dec 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeler Dec 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njsmith commented Dec 10, 2020

cdeler commented Dec 11, 2020

njsmith commented Dec 21, 2020

pgjones commented Dec 26, 2020

wonderiuy commented Dec 26, 2020

cdeler commented Nov 6, 2020 •

edited

Loading

(updates) How `maybe_extract_lines` works for now?

Without fix (`522b004`)

without fix (latest master `522b004`)

cdeler commented Nov 8, 2020 •

edited

Loading

How `maybe_extract_lines` works for now?

Without fix (`522b004`)

cdeler Nov 20, 2020 •

edited

Loading

Kane610 commented Dec 1, 2020 •

edited

Loading

hoorna commented Dec 2, 2020 •

edited

Loading

cdeler Dec 11, 2020 •

edited

Loading