Better surface page retries to users while crawling #2367

Shrinks99 · 2025-02-05T22:33:48Z

Context

Currently errors list if a page can not be loaded, but with crawler 1.5.0, pages that could not be loaded are automatically retried. While crawling, users use the error logs as a record for what ends up in the crawl.

Additionally, reties are currently opaque to users. They occur after the URL list has been fully depleted, but users aren't given any feedback to indicate that this is the case. The crawl continues to run without any feedback about what it is doing or why.

Changes

Log retries as a warning instead of an error unless the final retry fails to capture the page, then it gets logged as an error
Add retries to the end of the queue so they are visible to users
Live exclusions added after a URL has been added to the retry queue will apply to the URLs in the retry queue and they will be excluded accordingly

Shrinks99 assigned ikreymer Feb 5, 2025

Shrinks99 added this to Webrecorder Projects Feb 5, 2025

github-project-automation bot moved this to Triage in Webrecorder Projects Feb 5, 2025

ikreymer moved this from Triage to Todo in Webrecorder Projects Feb 5, 2025

ikreymer changed the title ~~Surface retries to users while crawling~~ Better surface page retries to users while crawling Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better surface page retries to users while crawling #2367

Better surface page retries to users while crawling #2367

Shrinks99 commented Feb 5, 2025 •

edited by ikreymer

Loading

Better surface page retries to users while crawling #2367

Better surface page retries to users while crawling #2367

Comments

Shrinks99 commented Feb 5, 2025 • edited by ikreymer Loading

Context

Changes

Shrinks99 commented Feb 5, 2025 •

edited by ikreymer

Loading