Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better detect / recover on rate limiting #2151

Open
benoit74 opened this issue Feb 6, 2025 · 0 comments
Open

Better detect / recover on rate limiting #2151

benoit74 opened this issue Feb 6, 2025 · 0 comments
Labels

Comments

@benoit74
Copy link
Contributor

benoit74 commented Feb 6, 2025

Command used to repro:

mwoffliner --webp --mwUrl="https://www.appropedia.org" --format="novid" --verbose="log" --publisher="openZIM" --adminEmail="[email protected]" --customZimTitle="Test" --customZimLanguage="eng" --customZimDescription="Test"

Version: 1.14.1-dev0

After few articles, the scraper gets rate-limited (note that it is using the VisualEditor renderer, I do not get such errors with the RestApi one):

[info] [2025-02-06T08:08:58.013Z] Getting JSON from [https://www.appropedia.org/w/api.php?action=parse&format=json&prop=modules%7Cjsconfigvars%7Cheadhtml&formatversion=2&page=World+Shelters]
[error] [2025-02-06T08:08:58.018Z] Error downloading article How_to_install_FLIR_Lepton_Thermal_Camera_and_applications_on_Raspberry_Pi/ja
[error] [2025-02-06T08:08:58.018Z] {
  code: 'rest-rate-limit-exceeded',
  info: 'A rate limit was exceeded. Please try again later.',
  'error-keys': [ 'actionthrottledtext' ],
  docref: 'See https://www.appropedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes.'
}
[error] [2025-02-06T08:08:58.019Z] Failed to run mwoffliner after [29s]: {}
[log] [2025-02-06T08:08:58.019Z] Exiting with code [2]

The error happens in modules retrieval (which is done before retrieving the article itself).

It escape the logic already in place around 429 HTTP errors, because this is a 200 HTTP response, with error field in response body (which content is displayed inside the logs).

I think we should have a mechanism to detect these rate limit errors as well. I struggle to find proper documentation on Mediawiki about which codes should be handled.

@benoit74 benoit74 added the bug label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant