Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting persistent data volumes accessible by all executors #186

Open
uniqueg opened this issue Aug 18, 2022 · 2 comments
Open

Supporting persistent data volumes accessible by all executors #186

uniqueg opened this issue Aug 18, 2022 · 2 comments
Milestone

Comments

@uniqueg
Copy link
Contributor

uniqueg commented Aug 18, 2022

Being able to have a TES implementation have access to a persistent data volume is something that the Greek ELIXIR node requested (see here for more details). A potential use case is for a TES implementation that is deployed in an environment where it repeatedly runs specific sets of tasks and using the same reference data over and over again.

The current specification of tesTask.volumes do not meet this requirement as it states that they "are initialized as empty directories".

A similar request was/is also discussed in Cromwell: broadinstitute/cromwell#2190

I don't really have in mind what this could look like, but I thought I would open this issue so that we could discuss.

Thanks to @zagganas and @hex43ver

@uniqueg
Copy link
Contributor Author

uniqueg commented Aug 18, 2022

Some random thoughts for the discussion:

  • Should there be a way for a client to ask a TES deployment to make a particular object persist? If so, where exactly, for how long, how to communicate that TES did so etc.?
  • Should there be a mechanism to populate a persistent volume in bulk or should that be outside of the specs?
  • How would a client know what persistent data a TES deployment has? Could we do this via DRS?
  • Is maybe this whole feature outside of the scope of TES and we should just find a TES-compliant workaround that can be realized in a given TES implementation?

@noooonee
Copy link

Thank you @uniqueg . This issue described our request precisely, we have human genome files (~20GB) and some static internal binary data need to be one-time-pre-populated before data processing, and want to minimize file copy consumptions.

This doesn't necessarily need to change the TES API, if there are implementations can provide such capability. But if TES API can design a standard presentation, can help a lot for other implementations.

For the syntax, my personal thought is, maybe the docker volume expression is good enough?

    name-of-a- custom-volume:/path-inside-container
    path-from-runtime-node-host:/path-inside-container

There might be a lot more ideas come out, like the docker volume bind propagation concepts, I can understand that TES must limit the scope at a maintainable level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants