Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropbox implementation + add existing implementation in the registry #207

Merged
merged 10 commits into from
Jan 7, 2020
10 changes: 10 additions & 0 deletions fsspec/caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,16 @@ def __len__(self):
return len(self.cache)


class AllBytes(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the http implementation, which seems unnecessary duplication. Indeed, if we can fetch ranges, then this is not required anymore at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below about "copy-past".
Also, your colleague advises me to put this class in the caching file. If the http implementation is modified to used it from the catching module, it will remove the redundancy problem at least for this part of the code and it seems also more coherent. But as the pull request is not about the http implementation, I didn't wanted to modify it my-self here.
Tell me what you prefer about this please. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this is clearly a caching method and should be included in the caching module, although it is not particularly useful for most files, since it just keeps the whole thing in memory. Moving the def out of the http module now is fine - there are tests for that functionality that would fail if somehow moving the code went wrong.

"""Cache entire contents of the file"""

def __init__(self, data):
self.data = data

def _fetch(self, start, end):
return self.data[start:end]


caches = {
"none": BaseCache,
"mmap": MMapCache,
Expand Down
11 changes: 1 addition & 10 deletions fsspec/implementations/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from fsspec import AbstractFileSystem
from fsspec.spec import AbstractBufferedFile
from fsspec.utils import tokenize, DEFAULT_BLOCK_SIZE
from ..caching import AllBytes

# https://stackoverflow.com/a/15926317/3821154
ex = re.compile(r"""<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1""")
Expand Down Expand Up @@ -359,13 +360,3 @@ def file_size(url, session=None, size_policy="head", **kwargs):
return int(r.headers["Content-Length"])
elif "Content-Range" in r.headers:
return int(r.headers["Content-Range"].split("/")[1])


class AllBytes(object):
"""Cache entire contents of a remote URL"""

def __init__(self, data):
self.data = data

def _fetch(self, start, end):
return self.data[start:end]
7 changes: 7 additions & 0 deletions fsspec/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@
known_implementations = {
"file": {"class": "fsspec.implementations.local.LocalFileSystem"},
"memory": {"class": "fsspec.implementations.memory.MemoryFileSystem"},
"dropbox": {
"class": "dropboxdrivefs.DropboxDriveFileSystem",
"err": (
'DropboxFileSystem requires "dropboxdrivefs",'
'"requests" and "dropbox" to be installed'
),
},
"http": {
"class": "fsspec.implementations.http.HTTPFileSystem",
"err": 'HTTPFileSystem requires "requests" to be installed',
Expand Down