-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store ert<->everest realization mapping #9767
base: main
Are you sure you want to change the base?
Store ert<->everest realization mapping #9767
Conversation
CodSpeed Performance ReportMerging #9767 will not alter performanceComparing Summary
|
4e70ffd
to
be7341e
Compare
d4d8a92
to
2202375
Compare
2202375
to
3990831
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but basically solid.
|
||
realization_mapping: dict[int, EverestRealizationInfo] = { | ||
idx: EverestRealizationInfo( | ||
geo_realization=self._everest_config.model.realizations[real], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we internally already move from geo
to model
, I think we should encourage that in the long run.
src/ert/storage/everest_ensemble.py
Outdated
class EverestRealizationInfo(TypedDict): | ||
geo_realization: int | ||
perturbation: int | None # None means it stems from unperturbed controls | ||
# Q: Maybe we also need result ID, or no? Ref if we have multiple evaluations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will probably, let's not worry about it now.
for e in self.ensembles: | ||
ens_parameters = { | ||
group: e.ert_ensemble.load_parameters(group) | ||
.to_dataarray() | ||
.data.reshape((e.ert_ensemble.ensemble_size, -1)) | ||
for group in parameter_values | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you think may need some caching here, or is there already in the storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just points to parameters saved in xr files in ERT storage
else None | ||
), | ||
) | ||
for idx, real in enumerate(evaluator_context.realizations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maps the indices of the control vectors to realizations/perturbations. Is that the intention? Because if the intention is to map simulation ID (= ert realization?) then this will fail if some control vectors are not evaluated due to being cached or being marked as inactive. You would need to do the same as I did in my recent PR to get the simulation ID's for the simulations that actually did run.
)["values"] | ||
|
||
cached_data = ( | ||
objectives.to_numpy() * -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful after rebasing, the -1 is probably not needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the -1
if constraints is not None: | ||
assert cached_constraints is not None | ||
constraints[control_idx, ...] = cached_constraints | ||
for control_idx, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think caching is now always assumed to be on? We do have a flag for in in the configuration, maybe we should kill it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and comments
) | ||
|
||
# Keep for re-runs, will work in-memory | ||
# 2DO: If an experiment for this exact config already exists, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cached_results | ||
} | ||
|
||
cached_results2: dict[int, Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variables with numbers in them gets me a bit confused 😅 Could this be rewritten/simplified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, this was a temporary dev artifact, removed
) -> None: | ||
self._index.ert2ev_realization_mapping = realization_mapping | ||
self._ert_ensemble._storage._write_transaction( | ||
self.ert_ensemble._path / "everest_index.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Could we integrate this more closely into the regular Ensemble
? No reason that could not have metadata
@@ -589,6 +590,15 @@ def _to_parquet_transaction( | |||
os.chmod(f.name, 0o660) | |||
os.rename(f.name, filename) | |||
|
|||
def create_everest_experiment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to integrate this into a regular experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be put there yes... but whether we should have the logic merged into the ert ensemble vs keep the everest logic in one place, should be discussed I think.
And this layer still writes its metadata files into the "normal" ert experiment wrt files etc, it is just one place where we can put all the everest-related storage logic. In practice it should be the same effect as if the code was put directly into the ert experiment.
There are some edge cases and possible cases where we might need to add in more everest-specific storage logic, will likely grow as we resolve issues:
#9937
#9938 ,
- Wrt caching there is a case, where we might want one realization to be a copy/symlink of another (from a previous ensemble)
- Also, the realization numbers per ensemble (in ert) only go per "active" evaluation, so if we want to evaluate realizations 0-15, but the previous example is a cache hit for realizations 0-5, we'll (currently) have a new ensemble with realizations 0-9, which is really [6-15]
I would say the more everest-specific logic we get in there, the more argument against putting everest-specific methods etc into the ERT LocalEnsemble, but again up for discussion.
3990831
to
f892664
Compare
f892664
to
39226fe
Compare
39226fe
to
af05d6a
Compare
Note: Should not be merged until a consistent realization mapping is received from ROPT, replacing the logic in the second commit of this PR.
Issue
Resolves #9751
Resolves #9674
Approach
Short description of the approach
(Screenshot of new behavior in GUI if applicable)
git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"'
)When applicable