You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After few articles, the scraper gets rate-limited (note that it is using the VisualEditor renderer, I do not get such errors with the RestApi one):
[info] [2025-02-06T08:08:58.013Z] Getting JSON from [https://www.appropedia.org/w/api.php?action=parse&format=json&prop=modules%7Cjsconfigvars%7Cheadhtml&formatversion=2&page=World+Shelters]
[error] [2025-02-06T08:08:58.018Z] Error downloading article How_to_install_FLIR_Lepton_Thermal_Camera_and_applications_on_Raspberry_Pi/ja
[error] [2025-02-06T08:08:58.018Z] {
code: 'rest-rate-limit-exceeded',
info: 'A rate limit was exceeded. Please try again later.',
'error-keys': [ 'actionthrottledtext' ],
docref: 'See https://www.appropedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes.'
}
[error] [2025-02-06T08:08:58.019Z] Failed to run mwoffliner after [29s]: {}
[log] [2025-02-06T08:08:58.019Z] Exiting with code [2]
The error happens in modules retrieval (which is done before retrieving the article itself).
It escape the logic already in place around 429 HTTP errors, because this is a 200 HTTP response, with error field in response body (which content is displayed inside the logs).
I think we should have a mechanism to detect these rate limit errors as well. I struggle to find proper documentation on Mediawiki about which codes should be handled.
The text was updated successfully, but these errors were encountered:
Command used to repro:
Version: 1.14.1-dev0
After few articles, the scraper gets rate-limited (note that it is using the VisualEditor renderer, I do not get such errors with the RestApi one):
The error happens in modules retrieval (which is done before retrieving the article itself).
It escape the logic already in place around 429 HTTP errors, because this is a 200 HTTP response, with
error
field in response body (which content is displayed inside the logs).I think we should have a mechanism to detect these rate limit errors as well. I struggle to find proper documentation on Mediawiki about which codes should be handled.
The text was updated successfully, but these errors were encountered: