Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store ert<->everest realization mapping #9767

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yngve-sk
Copy link
Contributor

@yngve-sk yngve-sk commented Jan 16, 2025

Note: Should not be merged until a consistent realization mapping is received from ROPT, replacing the logic in the second commit of this PR.

Issue
Resolves #9751
Resolves #9674

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@yngve-sk yngve-sk self-assigned this Jan 16, 2025
Copy link

codspeed-hq bot commented Jan 16, 2025

CodSpeed Performance Report

Merging #9767 will not alter performance

Comparing yngve-sk:25.01.16.add-everest-realization-info-to-ert (af05d6a) with main (cc15585)

Summary

✅ 25 untouched benchmarks

@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch 8 times, most recently from 4e70ffd to be7341e Compare January 27, 2025 08:20
Copy link
Contributor

@verveerpj verveerpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but basically solid.


realization_mapping: dict[int, EverestRealizationInfo] = {
idx: EverestRealizationInfo(
geo_realization=self._everest_config.model.realizations[real],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we internally already move from geo to model, I think we should encourage that in the long run.

class EverestRealizationInfo(TypedDict):
geo_realization: int
perturbation: int | None # None means it stems from unperturbed controls
# Q: Maybe we also need result ID, or no? Ref if we have multiple evaluations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will probably, let's not worry about it now.

Comment on lines +54 to +60
for e in self.ensembles:
ens_parameters = {
group: e.ert_ensemble.load_parameters(group)
.to_dataarray()
.data.reshape((e.ert_ensemble.ensemble_size, -1))
for group in parameter_values
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you think may need some caching here, or is there already in the storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just points to parameters saved in xr files in ERT storage

else None
),
)
for idx, real in enumerate(evaluator_context.realizations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps the indices of the control vectors to realizations/perturbations. Is that the intention? Because if the intention is to map simulation ID (= ert realization?) then this will fail if some control vectors are not evaluated due to being cached or being marked as inactive. You would need to do the same as I did in my recent PR to get the simulation ID's for the simulations that actually did run.

)["values"]

cached_data = (
objectives.to_numpy() * -1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful after rebasing, the -1 is probably not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the -1

if constraints is not None:
assert cached_constraints is not None
constraints[control_idx, ...] = cached_constraints
for control_idx, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think caching is now always assumed to be on? We do have a flag for in in the configuration, maybe we should kill it.

Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and comments

)

# Keep for re-runs, will work in-memory
# 2DO: If an experiment for this exact config already exists,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return cached_results
}

cached_results2: dict[int, Any] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables with numbers in them gets me a bit confused 😅 Could this be rewritten/simplified?

Copy link
Contributor Author

@yngve-sk yngve-sk Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, this was a temporary dev artifact, removed

) -> None:
self._index.ert2ev_realization_mapping = realization_mapping
self._ert_ensemble._storage._write_transaction(
self.ert_ensemble._path / "everest_index.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Could we integrate this more closely into the regular Ensemble? No reason that could not have metadata

@@ -589,6 +590,15 @@ def _to_parquet_transaction(
os.chmod(f.name, 0o660)
os.rename(f.name, filename)

def create_everest_experiment(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to integrate this into a regular experiment?

Copy link
Contributor Author

@yngve-sk yngve-sk Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be put there yes... but whether we should have the logic merged into the ert ensemble vs keep the everest logic in one place, should be discussed I think.

And this layer still writes its metadata files into the "normal" ert experiment wrt files etc, it is just one place where we can put all the everest-related storage logic. In practice it should be the same effect as if the code was put directly into the ert experiment.

There are some edge cases and possible cases where we might need to add in more everest-specific storage logic, will likely grow as we resolve issues:
#9937
#9938 ,

  • Wrt caching there is a case, where we might want one realization to be a copy/symlink of another (from a previous ensemble)
  • Also, the realization numbers per ensemble (in ert) only go per "active" evaluation, so if we want to evaluate realizations 0-15, but the previous example is a cache hit for realizations 0-5, we'll (currently) have a new ensemble with realizations 0-9, which is really [6-15]

I would say the more everest-specific logic we get in there, the more argument against putting everest-specific methods etc into the ERT LocalEnsemble, but again up for discussion.

@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from 3990831 to f892664 Compare February 11, 2025 14:02
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from f892664 to 39226fe Compare February 12, 2025 11:50
@yngve-sk yngve-sk force-pushed the 25.01.16.add-everest-realization-info-to-ert branch from 39226fe to af05d6a Compare February 12, 2025 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for Review
Development

Successfully merging this pull request may close these issues.

Add Everest realization metadata to ERT storage Use ERT storage for Everest simulator cache
3 participants