-
Notifications
You must be signed in to change notification settings - Fork 24
feat(grib): Improve template manager #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b45b878 to
22eb7c9
Compare
|
Hi @gmertes , this looks really good, thanks for working on this! I think our use pattern will mostly revolve on the "samples" provider. In the documentation you mention that we can use keys such as grid, levtype, param in the matching rules. However these are all mars keys and I am not sure we can use them to distinguish between all our templates. I was wondering: would also other arbitrary GRIB keys (e.g. numberOfPoints, typeOfLevel, etc.) be supported in the matching filters? CC @dnerini |
|
It is possible but we would have to tackle that first at the dataset creation side. The rules are matched against the contents of the |
Okay, I'll look into that, thanks. Unfortunately we tried to use the mars language for our matching filters but there are no keys in our |
| @cached_property | ||
| def _data(self): | ||
| return ekd.from_source("file", self.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we gain by making this a cached property as opposed to just assigning an attribute inside the __init__ (we would also know earlier if e.g. something goes wrong reading the data)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory you could have unused template providers in your config (if all the providers at the higher priority already provided all the templates). This way the file will not be opened if the template provider is not used. I think that the from_source will already open the file when it's first called, so delaying that until we are sure it's needed is a good design I think?
Re "knowing if something goes wrong" I think the only thing that is checked at this call is whether the file exists, if something is wrong with the grib you will still only see it when the data is accessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are saying is correct, my question is whether it's worth it. I would personally prefer a less sophisticated design that facilitates debugging and early exit of the program if something is wrong. "Fail fast" is also a good design.
Re "knowing if something goes wrong" I think the only thing that is checked at this call is whether the file exists, if something is wrong with the grib you will still only see it when the data is accessed.
Ideally one should be able to parse the headers of the data upfront without actually loading it. If that was possible we could just do that already in the __init__ ... but yeah apparently it's (shockingly) not so that's not an option.
HCookie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, very nicely written
| The manager for the template provider. | ||
| index_path : str | ||
| The path to the index file. | ||
| index_path : str | list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| index_path : str | list | |
| index : str | list |
| import earthkit.data as ekd | ||
|
|
||
| from anemoi.inference.decorators import main_argument | ||
| from anemoi.inference.inputs.ekd import find_variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably this should be moved into a utils
| if self.variables and variable not in self.variables: | ||
| return None | ||
|
|
||
| match self.mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay first use of a match
|
|
||
| if fallback: | ||
| self.fallback = fallback | ||
| else: | ||
| self.fallback = kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ignore kwargs if the fallback is set, I would suggest a .update(kwargs)?
| if len(variables) > 0: | ||
| to_log[provider] = set(variable.param for variable in variables) | ||
| for variable in variables: | ||
| variable._template_manager_logged = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like this setting of a private attribute on this object, Could you not use a set of param to unique identify and filter?
Description
Improve the grib template manager in a backwards compatible way.
samplesprovider, add the option for the sample index to be specified directly in the inference configfileprovider, add an option for the file to contain many parameters, and do a lookup based on the param name.input, this gives more flexibility to enable/disable combinations of template providers and set the priority..inputprovider fallback mode: force an output variable to use one of the input variablesAs a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.
📚 Documentation preview 📚: https://anemoi-inference--383.org.readthedocs.build/en/383/