Supporting persistent data volumes accessible by all executors #186

uniqueg · 2022-08-18T12:09:25Z

Being able to have a TES implementation have access to a persistent data volume is something that the Greek ELIXIR node requested (see here for more details). A potential use case is for a TES implementation that is deployed in an environment where it repeatedly runs specific sets of tasks and using the same reference data over and over again.

The current specification of tesTask.volumes do not meet this requirement as it states that they "are initialized as empty directories".

A similar request was/is also discussed in Cromwell: broadinstitute/cromwell#2190

I don't really have in mind what this could look like, but I thought I would open this issue so that we could discuss.

Thanks to @zagganas and @hex43ver

The text was updated successfully, but these errors were encountered:

uniqueg · 2022-08-18T12:15:22Z

Some random thoughts for the discussion:

Should there be a way for a client to ask a TES deployment to make a particular object persist? If so, where exactly, for how long, how to communicate that TES did so etc.?
Should there be a mechanism to populate a persistent volume in bulk or should that be outside of the specs?
How would a client know what persistent data a TES deployment has? Could we do this via DRS?
Is maybe this whole feature outside of the scope of TES and we should just find a TES-compliant workaround that can be realized in a given TES implementation?

noooonee · 2022-08-19T04:34:33Z

Thank you @uniqueg . This issue described our request precisely, we have human genome files (~20GB) and some static internal binary data need to be one-time-pre-populated before data processing, and want to minimize file copy consumptions.

This doesn't necessarily need to change the TES API, if there are implementations can provide such capability. But if TES API can design a standard presentation, can help a lot for other implementations.

For the syntax, my personal thought is, maybe the docker volume expression is good enough?

    name-of-a- custom-volume:/path-inside-container
    path-from-runtime-node-host:/path-inside-container

There might be a lot more ideas come out, like the docker volume bind propagation concepts, I can understand that TES must limit the scope at a maintainable level.

uniqueg mentioned this issue Aug 18, 2022

Mount PVC with reference data to executor elixir-cloud-aai/TESK#130

Open

patmagee mentioned this issue Sep 22, 2022

How do we define the technical capabilities of a given TES API #188

Open

uniqueg mentioned this issue Jun 18, 2024

Broadcasting an API implementation's optional capabilities ga4gh/TASC#45

Open

vsmalladi added this to the Next milestone Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting persistent data volumes accessible by all executors #186

Supporting persistent data volumes accessible by all executors #186

uniqueg commented Aug 18, 2022

uniqueg commented Aug 18, 2022

noooonee commented Aug 19, 2022

Supporting persistent data volumes accessible by all executors #186

Supporting persistent data volumes accessible by all executors #186

Comments

uniqueg commented Aug 18, 2022

uniqueg commented Aug 18, 2022

noooonee commented Aug 19, 2022