Tackle the teardown procedure for huge ephemereal datasets #72

alexbarcelo · 2020-05-14T11:27:25Z

When dataClay is being shut down, it stores all the data into the database.

This is correct for an object store where data is persistent. However, all our HPC use cases are focused on some kind of volatile datasets that are huge. The innecessary teardown procedures means that:

Executions may be half an hour longer than necessary (so batch explorations are much lower)
Home quota is filled (so sequential executions have to wait for a human to manually clean the files)

dataClay should retain the feature --as it is an object store-- but we should improve its behaviour on the "ephemereal execution" --as those are all our current HPC use cases.

Proposal:

Flag (or similar mechanism) for the orchestration to indicate a "dirty shutdown". This may be the default for enqueue_compss-triggered scenarios.
Alternatively, an "ephemereal HPC" flag which forces dataClay DataServices to avoid serializing to disk altogether.

The text was updated successfully, but these errors were encountered:

alexbarcelo added enhancement New feature or request hpc Related to HPC (performance, deployment...) labels May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tackle the teardown procedure for huge ephemereal datasets #72

Tackle the teardown procedure for huge ephemereal datasets #72

alexbarcelo commented May 14, 2020

Tackle the teardown procedure for huge ephemereal datasets #72

Tackle the teardown procedure for huge ephemereal datasets #72

Comments

alexbarcelo commented May 14, 2020