Skip to content

Simplified storage creation with from_url helper method. #1435

@jbusecke

Description

@jbusecke

Problem

I find myself repeatedly writing code related to url parsing to set up icechunk storage configurations:

Example (taken from here):

url = "s3://nasa-eodc-public/icechunk/MUR-JPL-L4-GLOB-v4.1-virtual-v2-p2"
url_parsed = urlparse(url)

storage = ic.s3_storage(
    bucket = url_parsed.netloc,
    prefix = url_parsed.path.lstrip('/'),
    from_env=True,
)

Not too terrible, but I would love if this could be simplified, an particularly if the sanitizing (i.e. the stripping of leading slashes above) could happen internally in icechunk.

Proposed Solution

An ideal solution to me would look something like this:

url = "s3://nasa-eodc-public/icechunk/MUR-JPL-L4-GLOB-v4.1-virtual-v2-p2"
storage = icechunk.Storage.from_url(url, **auth_kwargs)

I guess this would principally do two things:

  1. Auto detect the storage type from the beginning of the url
  2. parse the bucket and prefix from the url

I personally care more about 2. but it would be great to have both.

Is this generally within scope? If yes, id be happy to help out with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions