-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change User-Agent to common crawler format #1224
Comments
I agree. Relevant reading, perhaps, here: |
If ignoring this: It's just these two locations: Internet.nl/checks/http_client.py Line 62 in 7426760
Internet.nl/checks/tasks/tls_connection.py Line 663 in 7426760
Remaining questions are:
|
Latest RFC on User-Agent header: https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent |
Question: What User-Agent header are other test tools using? |
|
Thanks! See also: https://udger.com/resources/ua-list/crawlers |
Oh, cool' we're on that list: https://udger.com/resources/ua-list/bot-detail?bot=internetnl#id131933 |
Priority for this issue is asked by a governmental agency, currently the IPv4/IPv6 compare fails because the User-Agent |
For the record: I'm proposing to put internetnl and the version string in the comment field only. Decided with @baknu:
Note again, internet.nl does not always send a User-Agent, which is a separate bug: |
Currently
internetnl/1.0
is used, this is not ideal since it's not a common format plus since docker others can easily spin up their own instance and the UA should reflect at least the correct link to contact the server/person crawling.As mentioned before in #363 (comment) and #1042 (comment) I would prefer to change this to a common bot user-agent like also listed in MDN.
The more standardized and accepted User-Agent is
Mozilla/5.0 (compatible; SoftwareName/0.1.2; +https://internet.nl/)
where the last+
part could be the deployed instance (for a protected batch server another public page could be used, plus maybe include some #user-id-token, I've seen monitoring systems that do this). The+
part should be configurable, but could default to the current instance domain variable already used.So I suggest for us:
Mozilla/5.0 (compatible; internetnl/1.8.3; +https://internet.nl/about/)
Ideally we would even setup a 'bot' page like
http://www.google.com/bot.html
.The RFC 1945 - 10.5 User-Agent is not strict:
3.7 Product Tokens defines:
2.2 Basic Rules defines the comment as:
The text was updated successfully, but these errors were encountered: