Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better surface page retries to users while crawling #2367

Open
1 of 3 tasks
Shrinks99 opened this issue Feb 5, 2025 · 0 comments
Open
1 of 3 tasks

Better surface page retries to users while crawling #2367

Shrinks99 opened this issue Feb 5, 2025 · 0 comments
Assignees

Comments

@Shrinks99
Copy link
Member

Shrinks99 commented Feb 5, 2025

Context

Currently errors list if a page can not be loaded, but with crawler 1.5.0, pages that could not be loaded are automatically retried. While crawling, users use the error logs as a record for what ends up in the crawl.

Additionally, reties are currently opaque to users. They occur after the URL list has been fully depleted, but users aren't given any feedback to indicate that this is the case. The crawl continues to run without any feedback about what it is doing or why.

Changes

  • Log retries as a warning instead of an error unless the final retry fails to capture the page, then it gets logged as an error
  • Add retries to the end of the queue so they are visible to users
  • Live exclusions added after a URL has been added to the retry queue will apply to the URLs in the retry queue and they will be excluded accordingly
@ikreymer ikreymer moved this from Triage to Todo in Webrecorder Projects Feb 5, 2025
@ikreymer ikreymer changed the title Surface retries to users while crawling Better surface page retries to users while crawling Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants