-
Notifications
You must be signed in to change notification settings - Fork 374
_ls_from_cache: weird behavior regarding comparisons with trailing slash #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would be a good idea to encode this into a test, so that we can not only fix it, but make sure we don't break it again. I would side with: the directory names in dircache should not include the trailing "/". |
I can send a patch for the first normalization and a test though
What do you think about the second normalization? ( |
@martindurant, @isidentical One of the issues I've encountered with Azure is that, because its a flat filesystem, each of these can be different:
The distinctions are causing issues with Hive partitioning, as described here. I just merged this fix into master, but realized the fix doesn't (for the moment) align with this guidance, because a folder would, by default, have a trailing slash. |
gcsfs and s3fs keep getting hit by this kind of thing too. |
@martindurant is there a document for this (or issue for tracking writing this document)? |
#562 is the discussion where I hope we can come to an agreement. I imagined the outcome to be a set of tests hosted in this repo, but run against all three object stores. |
In this example, the path is stripped only for the contain check. When dircache contains the normalized version (
a/b
) but nota/b/
it will throw aKeyError
though it probably shouldn't since this is what that check appears to be stand for. Another case is right below that code that compares the normalized version with the obtained one from the cache, which might not be normalized. So I was wondering whether this is intentional or should we use.rstrip()
in both of the cases (self.dircache[path.rstrip('/')]
,f["name"].rstrip("/") == path.rstrip("/")
)https://github.com/intake/filesystem_spec/blob/6ee47aa6c10868c383be5cb13837f2a26a099f5d/fsspec/spec.py#L342-L343
https://github.com/intake/filesystem_spec/blob/6ee47aa6c10868c383be5cb13837f2a26a099f5d/fsspec/spec.py#L349
The text was updated successfully, but these errors were encountered: