Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk_mkdirs #1592

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion fsspec/asyn.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@ async def _put(
rdirs = [r for l, r in zip(lpaths, rpaths) if is_dir[l]]
file_pairs = [(l, r) for l, r in zip(lpaths, rpaths) if not is_dir[l]]

await asyncio.gather(*[self._makedirs(d, exist_ok=True) for d in rdirs])
await self._bulk_makedirs(rdirs, exist_ok=True)
batch_size = batch_size or self.batch_size

coros = []
Expand All @@ -590,6 +590,9 @@ async def _put(
coros, batch_size=batch_size, callback=callback
)

async def _bulk_makedirs(self, dirs, **kw):
await asyncio.gather(*[self._makedirs(_, **kw) for _ in dirs])

async def _get_file(self, rpath, lpath, **kwargs):
raise NotImplementedError

Expand Down
3 changes: 3 additions & 0 deletions fsspec/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,9 @@ def makedirs(self, path, exist_ok=False):
"""
pass # not necessary to implement, may not have directories

def bulk_makedirs(self, path, **kw):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a special function here instead of modifying the makedirs function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined it like cp/cp_file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just override makedirs to no check if the dirrectory exists for object stores since we directories aren't really a concept there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_makedirs is overidden by implementations already, so every implementation would have to make the list-of-paths check or do some sort of super()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confused is this suppose to overridden by the implementation? Are there are implementations that override it currently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there are none, but s3fs would be the first, since we don't want to check the existence of every directory, only the root bucket, or try to make the bucket multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can think of a simpler way to do it, happy to hear!

[self.makedirs(_, **kw) for _ in path]

def rmdir(self, path):
"""Remove a directory, if empty"""
pass # not necessary to implement, may not have directories
Expand Down