-
Notifications
You must be signed in to change notification settings - Fork 373
LocalFileSystem restore _strip_protocol signature #1567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalFileSystem restore _strip_protocol signature #1567
Conversation
@fleming79 , this undoes much of your code, but maintains the tests for correctness, while reimplementing consistency for _strip_protocol. Do you have thoughts? |
I have no issue with removing the option 'remove_trailing_slash' if it isn't required. It was provided because there were The question I pose is - which code is more maintainable/performant/correct? |
Thanks for the quick response! Regarding maintainability, I would prefer if the method signatures would agree with Regarding performance, I'm not a big fan of micro-benchmarks anymore, but I ran one (checking 10000 generated URIs) and the implementation in this PR seems to be equally fast/slow (within margin of error) as [click to expand] benchmark for `_strip_protocol` and `_parent`
# pip install asv
import itertools
import random
import string
from fsspec import get_filesystem_class
random.seed(0)
def _make_random_path():
# build a random length path
pth_len = random.randint(5, 40)
return "".join(random.sample(string.ascii_letters + "/", k=pth_len))
def _make_uris(n):
# create uris with and without protocols and with and without netlocs
it_proto_netloc = itertools.cycle(
itertools.product(
["file", "local", "wrong", None],
["netloc", ""]
)
)
for _ in range(n):
proto, netloc = next(it_proto_netloc)
pth = _make_random_path()
if proto and netloc:
yield f"{proto}://{netloc}/{pth}"
elif proto:
yield f"{proto}:/{pth}"
else:
yield f"{netloc}/{pth}"
uris = list(_make_uris(10000))
class Suite:
def setup(self):
self.fs_cls = get_filesystem_class("file")
self.uris = uris
def teardown(self):
del self.uris
def time_split_protocol(self):
for uri in self.uris:
self.fs_cls._strip_protocol(uri)
def time_parent(self):
for uri in self.uris:
self.fs_cls._parent(uri) To be able to answer if this implementation is more or less correct would require more tests. I'll try to find some time to come up with more cases for the test-suite. Cheers, |
I don't think we can trust timing differences <5%; in these cases it can even matter which order you run them (whether the CPU is warm, etc). The interesting thing would be to check on Windows (but they have lots of path styles). |
The original Here is a starting point.
|
Thank you @fleming79 for the proposed implementations. I'll try them out shortly, when I find some more time. I've been collecting more test cases for So far I noticed that between
Outdated. todo: update
on posix py310
on windows py311
|
Performance comparison on windows python==3.11, shows that the I'll update this benchmark once I test the implementation provided in the comment above. outdated benchmark
|
Hello @martindurant and @fleming79 I found some time to continue with this PR, and it would be great to get your feedback. NOTE: the failing tests seem to be Summarizing the changes:
Performance:I ran a benchmark on 10000 randomly generated uri's comparing performance of click to show code for generating the urisimport itertools
import random
import string
from fsspec import get_filesystem_class
random.seed(0)
def _make_random_path(style):
pth_len = random.randint(5, 40)
if style == "posix":
chars = string.ascii_letters + "/"
prefix = ""
elif style == "win":
chars = string.ascii_letters + "\\"
prefix = "c:\\"
elif style == "win-posix":
chars = string.ascii_letters + "/"
prefix = "c:/"
else:
raise ValueError(f"Unknown style {style}")
return prefix + "".join(random.sample(chars, k=pth_len))
def _make_uris(n):
it_proto_netloc_style = itertools.cycle(
itertools.product(
["file", "local", "wrong", None],
["netloc", ""],
["posix", "win", "win-posix"],
)
)
for _ in range(n):
proto, netloc, style = next(it_proto_netloc_style)
pth = _make_random_path(style)
if proto and netloc:
yield f"{proto}://{netloc}/{pth}"
elif proto:
yield f"{proto}:/{pth}"
else:
yield f"{netloc}/{pth}"
uris = list(_make_uris(10000))
class Suite:
def setup(self):
self.fs_cls = get_filesystem_class("file")
self.uris = uris
def teardown(self):
del self.uris
def time_split_protocol(self):
for uri in self.uris:
self.fs_cls._strip_protocol(uri)
def time_parent(self):
for uri in self.uris:
self.fs_cls._parent(uri) Ubuntu under WSL2_strip_protocol
_parent
Windows 11_strip_protocol
_parent
Some open questions:
Have a great day! |
Note: the failing CI runs seem to be caused by a new git release? pypa/setuptools-scm#1038 |
@ap-- It looks like you've made some good changes.
I agree those tests are useful and an error should probably be raised for invalid types somehow. I guess removing the coercion would suffice, but maybe it would be better to also check the return type from the call to |
Do those fail with a different exception type now, or just return garbage pathstrings?
This is for the s3fs build, right? We could pin its requirement on setuptools_scm, wait for a new release, or move that build to hatch (as fsspec has been). |
Current master returns garbage: >>> from fsspec.implementations.local import make_path_posix
>>> make_path_posix(object())
'/Users/poehlmann/development/filesystem_spec/<object object at 0x100664b80>' Possible fixes are:
I think this happens when during the s3fs install fsspec gets installed. The traceback shows it failing in hatchling, which uses the hatch-vcs plugin to get the current version via setuptools_scm. setuptools_scm in turn has a compatibility issue with the newest git version which seems to be used on the Github Actions runner already via the conda test-env environment. I'll see if downgrading git will fix the issue. |
A specific version of git could be pinned in the CI conda environment file |
That made it work |
rerunning ... I have yet to come up with a way to get SMB to reliably pass every time. |
I'm happy with this. @fleming79, any comments? |
Awesome. So we just need a decision regarding
Update: I double checked, and |
|
I'm happy with In |
I made the changes and added tests for
|
@martindurant @ap-- Folks, quick question. Does this implementation change the behavior of methods like
in case if is it an expected behavior? Not sure 100% (need to check), but it seems it is not consistent with cloud storages. |
Hi @shcheklein Looking into this, the behavior of Specifically #1477 started raising a The previous behavior Click to see behavior of LocalFileSystem.ls for different fsspec versions# cat test_1567-issuecomment-2563160414.py
import fsspec
import pytest
def test_local_ls_file_exists(tmp_path):
file = tmp_path.joinpath("dependabot.yml")
file.write_text("hello world!")
urlpath = file.as_uri()
urlpath_with_trailing_slash = f"{urlpath}/"
fs = fsspec.filesystem("file")
path = fs._strip_protocol(urlpath)
assert fs.ls(urlpath) == [path]
assert fs.ls(urlpath_with_trailing_slash) == [path] # cat noxfile.py
import nox
@nox.session(python=["3.10"])
@nox.parametrize("fsspec", ["2024.2.0", "2024.3.0", "2024.3.1", "2024.5.0", "2024.12.0"])
def tests(session: nox.Session, fsspec) -> None:
session.install(f"fsspec=={fsspec}", "pytest")
session.run("pytest", "-v")
❯ nox
nox > Running session tests-3.10(fsspec='2024.2.0')
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests-3-10-fsspec-2024-2-0
nox > python -m pip install fsspec==2024.2.0 pytest
nox > pytest -v
================================================================ test session starts =================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /home/poehlmann/Development/fsspec#1567/.nox/tests-3-10-fsspec-2024-2-0/bin/python
cachedir: .pytest_cache
rootdir: /home/poehlmann/Development/fsspec#1567
collected 1 item
test_1567-issuecomment-2563160414.py::test_local_ls_file_exists PASSED [100%]
================================================================= 1 passed in 0.04s ==================================================================
nox > Session tests-3.10(fsspec='2024.2.0') was successful.
nox > Running session tests-3.10(fsspec='2024.3.0')
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests-3-10-fsspec-2024-3-0
nox > python -m pip install fsspec==2024.3.0 pytest
nox > pytest -v
================================================================ test session starts =================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /home/poehlmann/Development/fsspec#1567/.nox/tests-3-10-fsspec-2024-3-0/bin/python
cachedir: .pytest_cache
rootdir: /home/poehlmann/Development/fsspec#1567
collected 1 item
test_1567-issuecomment-2563160414.py::test_local_ls_file_exists PASSED [100%]
================================================================= 1 passed in 0.04s ==================================================================
nox > Session tests-3.10(fsspec='2024.3.0') was successful.
nox > Running session tests-3.10(fsspec='2024.3.1')
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests-3-10-fsspec-2024-3-1
nox > python -m pip install fsspec==2024.3.1 pytest
nox > pytest -v
================================================================ test session starts =================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /home/poehlmann/Development/fsspec#1567/.nox/tests-3-10-fsspec-2024-3-1/bin/python
cachedir: .pytest_cache
rootdir: /home/poehlmann/Development/fsspec#1567
collected 1 item
test_1567-issuecomment-2563160414.py::test_local_ls_file_exists FAILED [100%]
====================================================================== FAILURES ======================================================================
_____________________________________________________________ test_local_ls_file_exists ______________________________________________________________
tmp_path = PosixPath('/tmp/pytest-of-poehlmann/pytest-27/test_local_ls_file_exists0')
def test_local_ls_file_exists(tmp_path):
file = tmp_path.joinpath("dependabot.yml")
file.write_text("hello world!")
urlpath = file.as_uri()
urlpath_with_trailing_slash = f"{urlpath}/"
fs = fsspec.filesystem("file")
path = fs._strip_protocol(urlpath)
assert fs.ls(urlpath) == [path]
> assert fs.ls(urlpath_with_trailing_slash) == [path]
test_1567-issuecomment-2563160414.py:19:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/tests-3-10-fsspec-2024-3-1/lib/python3.10/site-packages/fsspec/implementations/local.py:66: in ls
info = self.info(path)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <fsspec.implementations.local.LocalFileSystem object at 0x7fdb63812e60>
path = '/tmp/pytest-of-poehlmann/pytest-27/test_local_ls_file_exists0/dependabot.yml/', kwargs = {}
def info(self, path, **kwargs):
if isinstance(path, os.DirEntry):
# scandir DirEntry
out = path.stat(follow_symlinks=False)
link = path.is_symlink()
if path.is_dir(follow_symlinks=False):
t = "directory"
elif path.is_file(follow_symlinks=False):
t = "file"
else:
t = "other"
path = self._strip_protocol(path.path)
else:
# str or path-like
path = self._strip_protocol(path)
> out = os.stat(path, follow_symlinks=False)
E NotADirectoryError: [Errno 20] Not a directory: '/tmp/pytest-of-poehlmann/pytest-27/test_local_ls_file_exists0/dependabot.yml/'
.nox/tests-3-10-fsspec-2024-3-1/lib/python3.10/site-packages/fsspec/implementations/local.py:92: NotADirectoryError
============================================================== short test summary info ===============================================================
FAILED test_1567-issuecomment-2563160414.py::test_local_ls_file_exists - NotADirectoryError: [Errno 20] Not a directory: '/tmp/pytest-of-poehlmann/pytest-27/test_local_ls_file_exists0/dependabot.yml/'
================================================================= 1 failed in 0.06s ==================================================================
nox > Command pytest -v failed with exit code 1
nox > Session tests-3.10(fsspec='2024.3.1') failed.
nox > Running session tests-3.10(fsspec='2024.5.0')
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests-3-10-fsspec-2024-5-0
nox > python -m pip install fsspec==2024.5.0 pytest
nox > pytest -v
================================================================ test session starts =================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /home/poehlmann/Development/fsspec#1567/.nox/tests-3-10-fsspec-2024-5-0/bin/python
cachedir: .pytest_cache
rootdir: /home/poehlmann/Development/fsspec#1567
collected 1 item
test_1567-issuecomment-2563160414.py::test_local_ls_file_exists PASSED [100%]
================================================================= 1 passed in 0.04s ==================================================================
nox > Session tests-3.10(fsspec='2024.5.0') was successful.
nox > Running session tests-3.10(fsspec='2024.12.0')
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/tests-3-10-fsspec-2024-12-0
nox > python -m pip install fsspec==2024.12.0 pytest
nox > pytest -v
================================================================ test session starts =================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /home/poehlmann/Development/fsspec#1567/.nox/tests-3-10-fsspec-2024-12-0/bin/python
cachedir: .pytest_cache
rootdir: /home/poehlmann/Development/fsspec#1567
collected 1 item
test_1567-issuecomment-2563160414.py::test_local_ls_file_exists PASSED [100%]
================================================================= 1 passed in 0.04s ==================================================================
nox > Session tests-3.10(fsspec='2024.12.0') was successful.
nox > Ran multiple sessions:
nox > * tests-3.10(fsspec='2024.2.0'): success
nox > * tests-3.10(fsspec='2024.3.0'): success
nox > * tests-3.10(fsspec='2024.3.1'): failed
nox > * tests-3.10(fsspec='2024.5.0'): success
nox > * tests-3.10(fsspec='2024.12.0'): success
It would probably be good to add a test case and clearly define what should happen. ❯ touch abc
❯ ls abc
abc
❯ ls abc/
ls: cannot access 'abc/': Not a directory To be on the safe side this should be checked on windows too, to see if the behavior is potentially different there. Happy holidays, |
@ap-- amazing research, thanks!
I guess, yes, my biggest question was if we can consider this behavior as a bug or is it intentional for some reason. It makes sense to me to be aligned with @martindurant what is your take and who is making these decisions? |
In practice, myself. I would rather not do that alone, and if we can get people together to define the expected behaviours across filesystem implementations, it would help greatly. |
Yes, we would certainly like consistency across implementations. There will always be some unavoidable inconsistencies, however, caused by inherent differences (e.g., on blob storage, a directory/prefix and file can have the same name, but not locally). |
I should be able to free up some time in the next weeks to start working on abstract tests and docs for the urlpath parsing interface of class AbstractFileSystem:
protocol: str | tuple[str, ...]
root_marker: Literal["/", ""]
sep: Literal["/"]
@classmethod
def _strip_protocol(cls, path: str) -> str:
...
def unstrip_protocol(self, name: str) -> str:
...
@staticmethod
def _get_kwargs_from_urls(path: str) -> dict[str, Any]:
...
@classmethod
def _parent(cls, path: str) -> str:
... |
Hello,
I noticed that the
LocalFileSystem._strip_protocol
signature changed in #1477, when running the universal_pathlib test-suite against the current fsspec version.To me it seems that the intention of #1477 was initially to prevent
"C:/"
or"C:\\"
to be reduced to"C:"
by_strip_protocol
, but in addition it introduced function signature changes infsspec.implementations.local
and minor changes infsspec.mapping
.I created this PR as a basis for discussion if the signature change could be avoided. In this PR I reverted the changes to
LocalFileSystem
andmake_path_posix
but kept the changes to the tests (first commit) and then provide an alternative implementation that avoids the function signature change.I would also be happy to try to add more tests around the
LocalFileSystem._parent
,LocalFileSystem._strip_protocol
andmake_path_posix
behavior for edge-cases if there is interest. But windows is not my main OS for daily work, so I am very likely not aware of most of them.Cheers,
Andreas