-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature request
https://github.com/huggingface/datasets/blob/4.0.0/src/datasets/load.py#L1411-L1418
def load_dataset(
...
dl_manager: Optional[DownloadManager] = None, # add this new argument
**config_kwargs,
) -> Union[DatasetDict, Dataset, IterableDatasetDict, IterableDataset]:
...
# Create a dataset builder
builder_instance = load_dataset_builder(
path=path,
name=name,
data_dir=data_dir,
data_files=data_files,
cache_dir=cache_dir,
features=features,
download_config=download_config,
download_mode=download_mode,
revision=revision,
token=token,
storage_options=storage_options,
**config_kwargs,
)
# Return iterable dataset in case of streaming
if streaming:
return builder_instance.as_streaming_dataset(split=split)
# Note: This is the revised part
if dl_manager is None:
if download_config is None:
download_config = DownloadConfig(
cache_dir=builder_instance._cache_downloaded_dir,
force_download=download_mode == DownloadMode.FORCE_REDOWNLOAD,
force_extract=download_mode == DownloadMode.FORCE_REDOWNLOAD,
use_etag=False,
num_proc=num_proc,
token=builder_instance.token,
storage_options=builder_instance.storage_options,
) # We don't use etag for data files to speed up the process
dl_manager = DownloadManager(
dataset_name=builder_instance.dataset_name,
download_config=download_config,
data_dir=builder_instance.config.data_dir,
record_checksums=(
builder_instance._record_infos or verification_mode == VerificationMode.ALL_CHECKS
),
)
# Download and prepare data
builder_instance.download_and_prepare(
download_config=download_config,
download_mode=download_mode,
verification_mode=verification_mode,
dl_manager=dl_manager, # pass the new argument
num_proc=num_proc,
storage_options=storage_options,
)
...
Motivation
In my case, I'm hoping to deal with the cache files downloading manually (not using hash filenames and save to another location, or using potential existing local files).
Your contribution
It's already implemented above. If maintainers think this should be considered, I'll open a PR.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request