You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, a pipeline uses very large inputs in the first stage which makes it run slowly and take up a lot of disk space. However, it would be nice if it is rather fast to try out new code on single tasks or stages. Pipedag already supports running just single tasks or stages. When running a single task, it is already possible that a user plays in the temporary schema avoiding to ever schema swap. However, sometimes it would be nice to also commit a stage "per-user" and then run tasks with input being a mixture of the per-user inputs and the team-shared inputs.
This issue is about implementing a mixed per-user/team-shared mode. In this case, inputs to running subgraphs would generally be fetched from the team-shared version if no such input exists in the per-user version. Temporary schemas and committed stage schemas should always reside per-user. So mostly dematerialization would have to be adapted.
Options:
An advanced version of this idea could even do cache-invalidation checks on the team-shared instance, however, with some protection mechanism that prevents overwriting data in the team-shared instance.
This issue could interact with Retry of producing a stage output #167 in a way that one could update information table by table in the per-user temp schema with multiple runs.
It is even thinkable to allow mixed execution on two arbitrary pipeline instance configurations. Dominant use will probably still be per-user / team-shared instances of the same instance_id.
The text was updated successfully, but these errors were encountered:
Sometimes, a pipeline uses very large inputs in the first stage which makes it run slowly and take up a lot of disk space. However, it would be nice if it is rather fast to try out new code on single tasks or stages. Pipedag already supports running just single tasks or stages. When running a single task, it is already possible that a user plays in the temporary schema avoiding to ever schema swap. However, sometimes it would be nice to also commit a stage "per-user" and then run tasks with input being a mixture of the per-user inputs and the team-shared inputs.
This issue is about implementing a mixed per-user/team-shared mode. In this case, inputs to running subgraphs would generally be fetched from the team-shared version if no such input exists in the per-user version. Temporary schemas and committed stage schemas should always reside per-user. So mostly dematerialization would have to be adapted.
Options:
The text was updated successfully, but these errors were encountered: