Skip to content

Allow setting the root of a filesystem #395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhirschfeld opened this issue Sep 3, 2020 · 13 comments · Fixed by #745
Closed

Allow setting the root of a filesystem #395

dhirschfeld opened this issue Sep 3, 2020 · 13 comments · Fixed by #745

Comments

@dhirschfeld
Copy link
Contributor

It would be useful to be able to specify a filesystem with an arbitrary root - e.g.

fs = LocalFileSystem(root='C:/temp')

All operations would then be performed relative to this root - e.g. fs.ls('/') would then list the contents of C:/temp

@martindurant
Copy link
Member

This is an interesting idea. pyfilesystem2 has something similar.
Rather than adding this capability to LocalFileSystem or the general spec, I would write a new implementation PrefixFileSystem which operates on any other filesystem and does the path manipulation you would need here. This would be like the caching file systems, with target_protocol and target_options kwargs.
Another idea from pyfilesystems2 would be to generalise this further: to be able to "mount" any other filesystem as some given path, so that, for instance "/s3/" lives alongside normal local files. I'm not certain how useful that is, given the differences between implementations.

@martindurant
Copy link
Member

On the other hand, we already have ._strip_protocol() on all filesystems that would be a natural and easy place to put this logic.

@ahirner
Copy link
Contributor

ahirner commented Oct 24, 2020

I do like the PrefixFileSystem idea. We have something like that. However, the interaction of ls and _strip_protocol is tricky imo. The former needs to reverse the latter. Then, some operations don't seem to use _strip_protocol so we override major operations like get and rm as well. Furthermore, stripping root might not catch all affected detail fields of ls.

Did you think a PrefixFileSystem would work solely by overriding _strip_protocol? Maybe I'm on the wrong foot here.

@brunorpinho
Copy link

brunorpinho commented Jul 2, 2021

Based on what I read on this post, I changed LocalFileSystem and it seems to be working fine based on a few tests.

import os
from fsspec.implementations.local import LocalFileSystem, stringify_path, make_path_posix

class PrefixFileSystem(LocalFileSystem):
    
    def __init__(self, root_path='/', *args, **kwargs):
        self.root_path = stringify_path(root_path)
        super().__init__(*args, **kwargs)
    
    def _strip_protocol(self, path):
        path = stringify_path(path)
        if path.startswith("file://"):
            path = path[7:]
        return os.path.join(self.root_path, path).rstrip("/")

@martindurant
Copy link
Member

@brunorpinho , I don't think so, because the output of ls would include the original real paths of files, not their prefixed versions. Lots of other methods depend on ls (glob, find, any recursive-enabled method).

In any case, it would be a shame to provide something like this to the local files only.

@brunorpinho
Copy link

@martindurant yep, I see.

@lucmos
Copy link
Contributor

lucmos commented Sep 2, 2021

This would be extremely useful with s3 filesystems, so that you can change the fs root to the specified bucket without prefixing the bucket to every path

@martindurant
Copy link
Member

Perhaps you could make another implementation that wraps an arbitrary backend filesystem and does all the path manipulations? That's the kind of thing going on with the caching filesystems and ReferenceFileSystem, so there are precedents. Also, universal_path's implementation might be useful using Path-like objects.

@lucmos
Copy link
Contributor

lucmos commented Sep 2, 2021

I see thanks for the suggestions!
Do you have any specific pointer for the universal_path and Path-like objects?

@martindurant
Copy link
Member

See https://github.com/Quansight/universal_pathlib , and feel free to ask further questions there.

lucmos pushed a commit to lucmos/filesystem_spec that referenced this issue Sep 9, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative the its `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped prefix.

Resolves fsspec#395
lucmos pushed a commit to lucmos/filesystem_spec that referenced this issue Sep 9, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
lucmos pushed a commit to lucmos/filesystem_spec that referenced this issue Sep 9, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
lucmos pushed a commit to lucmos/filesystem_spec that referenced this issue Sep 9, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
@efiop
Copy link
Member

efiop commented Sep 21, 2021

A bit unrelated, but my 2c about Path-like objects: we need to be careful with those, as things might get really slow if you create lots of them. We have this problem with PathInfo's in dvc right now and working on getting rid of those in favor of using strings. It is also easy for anyone writing a new filesystem to accidentally hinder the performance when working with batches of files.

@ahirner
Copy link
Contributor

ahirner commented Sep 22, 2021

things might get really slow if you create lots of them

Can you link to an issue? It's unclear if creating PurePath and/or accidental IO is concerning you.

@efiop
Copy link
Member

efiop commented Sep 22, 2021

@ahirner It is creating PurePath, not accidental IO. When dealing with hundreds of thousands of files, creating PurePath for each one of those is takes significant time. Just bringing this up for the record, but it is not like this is unusual or not known. I see that @lucmos took good care of that in #745

lucmos pushed a commit to lucmos/filesystem_spec that referenced this issue Sep 22, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
lucmos added a commit to lucmos/filesystem_spec that referenced this issue Sep 22, 2021
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
efiop pushed a commit to lucmos/filesystem_spec that referenced this issue Jan 11, 2022
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
efiop pushed a commit to lucmos/filesystem_spec that referenced this issue Jan 13, 2022
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
efiop pushed a commit to lucmos/filesystem_spec that referenced this issue Jan 14, 2022
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves fsspec#395
efiop added a commit that referenced this issue Jan 14, 2022
The PrefixFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
is relative to the `prefix`. After performing the necessary paths operation it delegates
everything to the wrapped filesystem.

Resolves #395

Co-authored-by: Ruslan Kuprieiev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants