Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Tackle the teardown procedure for huge ephemereal datasets #72

Open
alexbarcelo opened this issue May 14, 2020 · 0 comments
Open

Tackle the teardown procedure for huge ephemereal datasets #72

alexbarcelo opened this issue May 14, 2020 · 0 comments
Labels
enhancement New feature or request hpc Related to HPC (performance, deployment...)

Comments

@alexbarcelo
Copy link
Member

When dataClay is being shut down, it stores all the data into the database.

This is correct for an object store where data is persistent. However, all our HPC use cases are focused on some kind of volatile datasets that are huge. The innecessary teardown procedures means that:

  • Executions may be half an hour longer than necessary (so batch explorations are much lower)
  • Home quota is filled (so sequential executions have to wait for a human to manually clean the files)

dataClay should retain the feature --as it is an object store-- but we should improve its behaviour on the "ephemereal execution" --as those are all our current HPC use cases.

Proposal:

  • Flag (or similar mechanism) for the orchestration to indicate a "dirty shutdown". This may be the default for enqueue_compss-triggered scenarios.
  • Alternatively, an "ephemereal HPC" flag which forces dataClay DataServices to avoid serializing to disk altogether.
@alexbarcelo alexbarcelo added enhancement New feature or request hpc Related to HPC (performance, deployment...) labels May 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request hpc Related to HPC (performance, deployment...)
Projects
None yet
Development

No branches or pull requests

1 participant