Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching by standard_name doesn't refine the resulting dataset by standard_name #348

Open
anton-seaice opened this issue Feb 20, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@anton-seaice
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

When searching a datastore by standard_name, it returns one catalog dataset:

e.g.
esm_ds.search(variable_standard_name="sea_surface_height_above_geoid")

However when .to_dask() is run, the resulting xarray dataset has all variables in the source files included:

Image

Describe the feature you'd like

I would like the resulting dataset to only include the variable which has the requested variable_standard_name

Describe alternatives you've considered

Ignore it, usign the cf standard name on the resulting dataset works fine, e.g.:

ds.cf["sea_surface_height_above_geoid"]

Additional context

This is mostly an issue of neatness.

As the dataset is still a dask object, and the source files need opening anyway, the performance benefit of only including the requested variables in the returned dataset is probably small.

I think this is an issue with intake-esm upstream ? ping @dougiesquire and @charles-turner-1

@dougiesquire
Copy link
Collaborator

Yeah, we've talked about this before @anton-seaice. This is a "feature" of Intake-esm datastores, which keep track of a single column for the dataset variables, defined by the variable_column_name. Only searches on this column (which is "variable" for most of our datastores) will refine the dataset returned by to_dask().

This possibly wouldn't be too hard to change in Intake-ESM

@charles-turner-1
Copy link
Collaborator

I'm also wondering if we might be able to use a DerivedVariableRegistry to do this on the fly - I'm not sure whether we might run into difficulties knowing all the different possible variants of variable_standard_name to open the dataset using.

But yeah, it could be a nice feature to add to intake-esm for sure.

@anton-seaice
Copy link
Collaborator Author

Oh and the same / similar bevause for variable_long_name would be good for consistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants