Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with directories #37

Open
DenisaCG opened this issue Dec 6, 2024 · 3 comments
Open

Dealing with directories #37

DenisaCG opened this issue Dec 6, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@DenisaCG
Copy link
Member

DenisaCG commented Dec 6, 2024

Problem description

The backend contents manager uses the obstore package to list the contents given a path inside a drive, retrieve the contents of an object and for all other functionalities to manipulate content (create, save, rename, copy, delete, download).

Unfortunately, the package does not include the concept of a directory. As such, when listing the contents given a path, it is impossible to differentiate between an empty directory or an empty file.

It is important to note that even if the S3 provider doesn't have a typical directory, they create a zero-length object with a key ending in / to mimic one (see red box marked important here).

The obstore package removes these trailing slashes when dealing with the key to an object, so it is impossible to properly interact with a directory. For example, when creating a directory, it actually creates a broken file, and only once you put other objects into that supposed directory, it creates it, but you are still left with the initial broken file.

image

As such, it becomes problematic to:

  • identify empty folders (compared to empty files)
  • rename, delete or copy the directories created outside of the DriveBrowser (the zero-length object itself, not the objects inside of it)

Current state of extension v0.0.1

The current logic to identify directories is based on the name not including ., as files which have extensions do (.txt, .ipynb, etc). If they do include a ., we check if the string after the character is not one of the registered file extensions in JupyterLab, and then consider it a directory.

The logic is faulty as directories can include the . character in whatever format. It also does not solve the issue of interacting with already created directories, which contain the trailing slash.

Possible solution

Use another package for the backend content manager to perform content manipulation operations, but keep the obstore package for its paginated listing abilities.

@DenisaCG DenisaCG added the bug Something isn't working label Dec 6, 2024
@DenisaCG
Copy link
Member Author

Update for v0.1.0

We are using the s3fs package to perform all content manipulation functionalities, while keeping the obstore package for listing the contents given a path and retrieving contents of a file, as it supports pagination.

The s3fs package has the concept of a directory, but still fails to perform operations such as create, rename, copy and delete, as they all remove the trailing slash when formatting the path, which is essential to dealing with directories in the expected file browser experience.

Fix for manipulating directories

While we are using the function isdir to identify when we are dealing with a directory object, regardless of its name, we need another solution in order to benefit from all other functionalities. For this, we are attaching a suffix to directory objects, namely /.jupyter_drives_fix_dir. This way we can artificially create the directory, while placing a hidden file inside the folder, which won't be seen in the file browser.

Moreover, when encountering a folder created using the AWS console (which doesn't contain this suffix), we delete that object using aiobotocore and then use s3fs to create the object using the needed suffix, then proceed with the operation.

@gogakoreli
Copy link
Contributor

If the empty folder is created using the AWS console and I try to open this empty folder I'm still facing an error. I get 502 Bad Gateway api response and the following error is logged in. If I restart the Jupyter Lab then I am able to open this empty folder again, however looking from AWS Console in this S3 bucket this folder is still empty, there is no .jupyter_drives_fix_dir inside this empty folder but now I can open this empty folder from the JL S3 browser. See the error:

[E 2025-02-05 23:14:54.524 ServerApp] Uncaught exception in write_error
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.11/site-packages/jupyter_drives/manager.py", line 266, in get_contents
        obj = await obs.get_async(self._content_managers[drive_name]["store"], path)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: NotFound {
        path: "empty",
        source: Client {
            status: 404,
            body: Some(
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>empty</Key><RequestId>12SNXB7RTCFWBQG3</RequestId><HostId>rMGOt9JBTK8TPRiajxk0Pgc4XH8CX601fOicZTeWv1R/NKqDeLSIMA6HaxYmCFceLw6xTgc0+w0=</HostId></Error>",
            ),
        },
    }
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.11/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/jupyter_drives/handlers.py", line 86, in get
        result = await self._manager.get_contents(drive, path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/conda/lib/python3.11/site-packages/jupyter_drives/manager.py", line 293, in get_contents
        raise tornado.web.HTTPError(
    tornado.web.HTTPError: HTTP 400: The following error occured when retrieving the contents: NotFound {
        path: "empty",
        source: Client {
            status: 404,
            body: Some(
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>empty</Key><RequestId>12SNXB7RTCFWBQG3</RequestId><HostId>rMGOt9JBTK8TPRiajxk0Pgc4XH8CX601fOicZTeWv1R/NKqDeLSIMA6HaxYmCFceLw6xTgc0+w0=</HostId></Error>",
            ),
        },
    }
    
    During handling of the above exception, another exception occurred:

@gogakoreli
Copy link
Contributor

Apparently this error only happens if the empty folder is on the root level inside the S3 bucket, if the empty folder is inside another folder then this issue doesn't happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants