Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Add metadata to FlyteFile #6257

Open
2 tasks done
davidmirror-ops opened this issue Feb 18, 2025 · 0 comments · May be fixed by flyteorg/flytekit#3160
Open
2 tasks done

[Core feature] Add metadata to FlyteFile #6257

davidmirror-ops opened this issue Feb 18, 2025 · 0 comments · May be fixed by flyteorg/flytekit#3160
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request flytekit FlyteKit Python related issue

Comments

@davidmirror-ops
Copy link
Contributor

davidmirror-ops commented Feb 18, 2025

Motivation: Why do you think this is important?

Flyte's type system serializes to pickle if the data type doesn't have a registered TypeTransformer. This format is known to be insecure as it allows remote code execution at the deserialization phase.

If FlyteFile supports a metadata field, we could add a hash to it as an additional control to prevent pickling attacks or other forms of data-at-rest corruption.

It would help us even more to position Flyte as the right system to build a robust and secure ML supply chain.

Goal: What should the final outcome look like, ideally?

If this would be available, we could do something like:

def calculate_file_hash(file_path: str) -> str:
    """Calculate the SHA256 hash of a file."""
    with open(file_path, "rb") as f:
         sha256_hash = hashlib.sha256(f.read())
    return sha256_hash.hexdigest()

@task
def process_file(file_path: str) -> FlyteFile:
    # Calculate the hash of the file
    file_hash = calculate_file_hash(file_path)
    
    # Create a FlyteFile with hash as metadata
    flyte_file = FlyteFile(path=file_path, metadata={"hash": file_hash})
    
    return flyte_file

Describe alternatives you've considered

  • Create and register a Custom Type like ExtendedFlyteFile
  • Encode models into a custom data class with a method that calculates and validates hash

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@davidmirror-ops davidmirror-ops added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Feb 18, 2025
@eapolinario eapolinario added flytekit FlyteKit Python related issue backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Feb 20, 2025
@thomasjpfan thomasjpfan linked a pull request Feb 27, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request flytekit FlyteKit Python related issue
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

2 participants