Skip to content

Add support for CC-NEWS and validation for crawl reference on the CLI interface#12

Merged
pjox merged 8 commits intomainfrom
dev
Mar 18, 2025
Merged

Add support for CC-NEWS and validation for crawl reference on the CLI interface#12
pjox merged 8 commits intomainfrom
dev

Conversation

@pjox
Copy link
Member

@pjox pjox commented Mar 7, 2025

Description

This PR introduces support for CC-NEWS and adds validations for the crawl or snapshot references. This PR also updates some libraries and bumps the rust edition to 2024 and the latest 1.85 version. It also bumps the library version to 0.6.0.

Breaking Changes

No Breaking changes

Notes & open questions

This PR closes issues #8 and #10.

pjox added 7 commits March 6, 2025 13:53
…sn't exist and cc-downloader downloaded the body of the response

Now this action will produce an error
…the 4XX error message when downloading paths, adds validation to the cli input for the crawl reference
… automatically fix the casing of the crawl reference
CC-NEWS support and validation for crawl reference
…d files and updated the README.md in order to prepare the next release
@pjox pjox added bug Something isn't working enhancement New feature or request labels Mar 7, 2025
@pjox pjox requested a review from thunderpoot March 7, 2025 09:11
@pjox pjox self-assigned this Mar 7, 2025
@pjox
Copy link
Member Author

pjox commented Mar 12, 2025

@thunderpoot, don't accept or review the PR yet, the reqwest crate introduced a regression in the latest point upgrade that breaks reqwest-middleware and thus breaks cc-downloader:

I think it is going to get fixed soon:

But this is indeed a problem. I'm thinking of including the Cargo.lock file as a solution, since I'm converting this crate to a library anyway.

reqwest-middleware is also working on a fix now:

… the reqwest deprecated API

TODO: We need to monitor the the open PRs in reqwest-middleware and bump the version of it here as soon as they are merged
@pjox
Copy link
Member Author

pjox commented Mar 13, 2025

@thunderpoot It should be safe to review now. I'll track the PR on reqwest-middleware and fix the problem long-term in a future point update

Copy link
Member

@thunderpoot thunderpoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Nice error messaging, and fantastic that it supports my laziness (not typing CC-MAIN or CC-NEWS in all caps). Approved 💯

@pjox pjox merged commit 5fb6ff4 into main Mar 18, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for CC-NEWS Incorrect Handling of Nonexistent or Mis-cased Paths

2 participants