Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass proper user-agent in stream_file when host is upload.wikimedia.org #214

Open
benoit74 opened this issue Nov 11, 2024 · 1 comment
Open
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@benoit74
Copy link
Collaborator

For files hosted on upload.wikimedia.org, we must comply with their User-Agent policy at https://meta.wikimedia.org/wiki/User-Agent_policy

Doing so at scraperlib level in stream_file (main methods using in many scraper to download files / assets) would help avoid having to do so in every scraper (and forget about it over and over).

@benoit74
Copy link
Collaborator Author

That being said, I'm not sure this is really straightforward to implement.

Scraper should pass its name and version to scraperlib so that we set properly the header

And we also need a contact, which is probably more related to who ran the scraper

Not sure this is so easy to implement in the end.

@benoit74 benoit74 added question Further information is requested and removed good first issue Good for newcomers labels Nov 11, 2024
@benoit74 benoit74 added this to the backlog milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant