Closed
Description
Describe the bug
When downloading paths with a segment that doesn't exist (or one that does but is not written in the correct case such as cc-main-2025-05
in lowercase) it will save the file with the expected name and .gz extension but it will contain the S3 error:
$ cat warc.paths.gz
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>crawl-data/asdfadsf/warc.paths.gz</Key><RequestId>NYVQWQNC27BD94GE</RequestId><HostId>RoAV4secLh5r9pf8ixAFKbMObnFnJ0tGI0m80X9NzxInsR7RILRIoT/cKekF/y7VlctccRX3CPQ=</HostId></Error>%
To Reproduce
Steps to reproduce the behavior:
cc-downloader download-paths asdfadsf warc .
Expected behavior
I would have expected it to tell me that the path was not found. The tool could check the existence of the path before attempting to download and fail gracefully with a nice user-friendly message instead of saving an error response.