You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Being able to have a TES implementation have access to a persistent data volume is something that the Greek ELIXIR node requested (see here for more details). A potential use case is for a TES implementation that is deployed in an environment where it repeatedly runs specific sets of tasks and using the same reference data over and over again.
Should there be a way for a client to ask a TES deployment to make a particular object persist? If so, where exactly, for how long, how to communicate that TES did so etc.?
Should there be a mechanism to populate a persistent volume in bulk or should that be outside of the specs?
How would a client know what persistent data a TES deployment has? Could we do this via DRS?
Is maybe this whole feature outside of the scope of TES and we should just find a TES-compliant workaround that can be realized in a given TES implementation?
Thank you @uniqueg . This issue described our request precisely, we have human genome files (~20GB) and some static internal binary data need to be one-time-pre-populated before data processing, and want to minimize file copy consumptions.
This doesn't necessarily need to change the TES API, if there are implementations can provide such capability. But if TES API can design a standard presentation, can help a lot for other implementations.
For the syntax, my personal thought is, maybe the docker volume expression is good enough?
There might be a lot more ideas come out, like the docker volume bind propagation concepts, I can understand that TES must limit the scope at a maintainable level.
Being able to have a TES implementation have access to a persistent data volume is something that the Greek ELIXIR node requested (see here for more details). A potential use case is for a TES implementation that is deployed in an environment where it repeatedly runs specific sets of tasks and using the same reference data over and over again.
The current specification of
tesTask.volumes
do not meet this requirement as it states that they "are initialized as empty directories".A similar request was/is also discussed in Cromwell: broadinstitute/cromwell#2190
I don't really have in mind what this could look like, but I thought I would open this issue so that we could discuss.
Thanks to @zagganas and @hex43ver
The text was updated successfully, but these errors were encountered: