You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See commoncrawl/news-crawl#42 - http/2 was enabled by a security upgrade of JDK and the HTTP headers were written as they were "stringified" by the protocol layers.
New features
* Added a HttpRequest.Builder(method, uri) constructor that populates
the Host header.
Bugs fixed:
* WarcWriter.fetch(uri) was omitting the query string
Changes:
* ARC parser now accepts garbage in the MIME field
* HTTP parser in lenient mode now accepts messages without a minor
version number (e.g. "HTTP/2") #70
https://data.commoncrawl.org/crawl-data/CC-NEWS/2020/09/CC-NEWS-20200921024254-00130.warc.gz invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200 \r\nserver: Apache\r\nx-gen-mode: full\r...
multiple errors from files this year/month
The text was updated successfully, but these errors were encountered: