Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200 #70

Closed
nice-redbull opened this issue Mar 6, 2023 · 1 comment
Closed

Comments

@nice-redbull
Copy link

https://data.commoncrawl.org/crawl-data/CC-NEWS/2020/09/CC-NEWS-20200921024254-00130.warc.gz invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200 \r\nserver: Apache\r\nx-gen-mode: full\r...

multiple errors from files this year/month

@sebastian-nagel
Copy link
Contributor

See commoncrawl/news-crawl#42 - http/2 was enabled by a security upgrade of JDK and the HTTP headers were written as they were "stringified" by the protocol layers.

@ato ato closed this as completed in 9ed3bb7 Jul 26, 2023
ato added a commit that referenced this issue Jul 26, 2023
New features

* Added a HttpRequest.Builder(method, uri) constructor that populates
  the Host header.

Bugs fixed:

* WarcWriter.fetch(uri) was omitting the query string

Changes:

* ARC parser now accepts garbage in the MIME field
* HTTP parser in lenient mode now accepts messages without a minor
  version number (e.g. "HTTP/2") #70
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants