-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Apply obstore as storage backend #3033
base: master
Are you sure you want to change the base?
Changes from all commits
58ba73c
79ea46d
caaa657
7ba66e2
0ef7c05
17bde4a
353f000
7f0782a
04bdf20
0189419
deb9f3d
0ca6dbc
42cc75f
a1c99ec
9c7e8db
6fba41b
99f9ede
f317ff7
749a7fe
31d8880
9645dc7
5ac0766
8a0f262
4009f80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,36 @@ | ||||||
""" | ||||||
Classes that overrides the AsyncFsspecStore that specify the filesystem specific parameters | ||||||
""" | ||||||
|
||||||
from obstore.fsspec import AsyncFsspecStore | ||||||
|
||||||
DEFAULT_BLOCK_SIZE = 5 * 2**20 | ||||||
|
||||||
|
||||||
class ObstoreS3FileSystem(AsyncFsspecStore): | ||||||
""" | ||||||
Add following property used in S3FileSystem | ||||||
""" | ||||||
|
||||||
root_marker = "" | ||||||
blocksize = DEFAULT_BLOCK_SIZE | ||||||
protocol = ("s3", "s3a") | ||||||
_extra_tokenize_attributes = ("default_block_size",) | ||||||
|
||||||
|
||||||
class ObstoreGCSFileSystem(AsyncFsspecStore): | ||||||
""" | ||||||
Add following property used in GCSFileSystem | ||||||
""" | ||||||
|
||||||
scopes = {"read_only", "read_write", "full_control"} | ||||||
blocksize = DEFAULT_BLOCK_SIZE | ||||||
protocol = "gcs", "gs" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ensure protocol is a tuple
The Code suggestionCheck the AI-generated fix before applying
Suggested change
Code Review Run #52212e Is this a valid issue, or was it incorrectly flagged by the Agent?
|
||||||
|
||||||
|
||||||
class ObstoreAzureBlobFileSystem(AsyncFsspecStore): | ||||||
""" | ||||||
Add following property used in AzureBlobFileSystem | ||||||
""" | ||||||
|
||||||
protocol = "abfs" |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -2,7 +2,7 @@ | |||||||||
import typing | ||||||||||
|
||||||||||
from flytekit import FlyteContext, lazy_module | ||||||||||
from flytekit.core.data_persistence import get_fsspec_storage_options | ||||||||||
from flytekit.core.data_persistence import get_fsspec_storage_options, split_path | ||||||||||
from flytekit.models import literals | ||||||||||
from flytekit.models.literals import StructuredDatasetMetadata | ||||||||||
from flytekit.models.types import StructuredDatasetType | ||||||||||
|
@@ -91,10 +91,11 @@ def decode( | |||||||||
current_task_metadata: StructuredDatasetMetadata, | ||||||||||
) -> pl.DataFrame: | ||||||||||
uri = flyte_value.uri | ||||||||||
|
||||||||||
bucket, _ = split_path(uri) | ||||||||||
kwargs = get_fsspec_storage_options( | ||||||||||
protocol=fsspec_utils.get_protocol(uri), | ||||||||||
data_config=ctx.file_access.data_config, | ||||||||||
bucket=bucket, | ||||||||||
) | ||||||||||
if current_task_metadata.structured_dataset_type and current_task_metadata.structured_dataset_type.columns: | ||||||||||
columns = [c.name for c in current_task_metadata.structured_dataset_type.columns] | ||||||||||
|
@@ -153,10 +154,11 @@ def decode( | |||||||||
current_task_metadata: StructuredDatasetMetadata, | ||||||||||
) -> pl.LazyFrame: | ||||||||||
uri = flyte_value.uri | ||||||||||
|
||||||||||
bucket, _ = split_path(uri) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider validating bucket before usage
Consider validating that Code suggestionCheck the AI-generated fix before applying
Suggested change
Code Review Run #39e27b Is this a valid issue, or was it incorrectly flagged by the Agent?
|
||||||||||
kwargs = get_fsspec_storage_options( | ||||||||||
protocol=fsspec_utils.get_protocol(uri), | ||||||||||
data_config=ctx.file_access.data_config, | ||||||||||
bucket=bucket, | ||||||||||
Comment on lines
+157
to
+161
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider validating bucket name before use
Consider validating the Code suggestionCheck the AI-generated fix before applying
Code Review Run #39e27b Is this a valid issue, or was it incorrectly flagged by the Agent?
|
||||||||||
) | ||||||||||
# use read_parquet instead of scan_parquet for now because scan_parquet currently doesn't work with fsspec: | ||||||||||
# https://github.com/pola-rs/polars/issues/16737 | ||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -39,6 +39,7 @@ dependencies = [ | |||||
"marshmallow-jsonschema>=0.12.0", | ||||||
"mashumaro>=3.15", | ||||||
"msgpack>=1.1.0", | ||||||
"obstore==0.3.0b10", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider version range for obstore
The version pinning of Code suggestionCheck the AI-generated fix before applying
Suggested change
Code Review Run #52212e Is this a valid issue, or was it incorrectly flagged by the Agent?
|
||||||
"protobuf!=4.25.0", | ||||||
"pygments", | ||||||
"python-json-logger>=2.0.0", | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
protocol
tuple definition is missing parentheses which could lead to incorrect protocol handling. Consider adding parentheses:protocol = ("gcs", "gs")
Code suggestion
Code Review Run #ab65d8
Is this a valid issue, or was it incorrectly flagged by the Agent?