You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When downloading paths with a segment that doesn't exist (or one that does but is not written in the correct case such as cc-main-2025-05 in lowercase) it will save the file with the expected name and .gz extension but it will contain the S3 error:
$ cat warc.paths.gz
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>crawl-data/asdfadsf/warc.paths.gz</Key><RequestId>NYVQWQNC27BD94GE</RequestId><HostId>RoAV4secLh5r9pf8ixAFKbMObnFnJ0tGI0m80X9NzxInsR7RILRIoT/cKekF/y7VlctccRX3CPQ=</HostId></Error>%
To Reproduce
Steps to reproduce the behavior:
cc-downloader download-paths asdfadsf warc .
Expected behavior
I would have expected it to tell me that the path was not found. The tool could check the existence of the path before attempting to download and fail gracefully with a nice user-friendly message instead of saving an error response.
The text was updated successfully, but these errors were encountered:
Describe the bug
When downloading paths with a segment that doesn't exist (or one that does but is not written in the correct case such as
cc-main-2025-05
in lowercase) it will save the file with the expected name and .gz extension but it will contain the S3 error:To Reproduce
Steps to reproduce the behavior:
Expected behavior
I would have expected it to tell me that the path was not found. The tool could check the existence of the path before attempting to download and fail gracefully with a nice user-friendly message instead of saving an error response.
The text was updated successfully, but these errors were encountered: