Searching by standard_name doesn't refine the resulting dataset by standard_name #348

anton-seaice · 2025-02-20T00:46:38Z

Is your feature request related to a problem? Please describe.

When searching a datastore by standard_name, it returns one catalog dataset:

e.g.
esm_ds.search(variable_standard_name="sea_surface_height_above_geoid")

However when .to_dask() is run, the resulting xarray dataset has all variables in the source files included:

Describe the feature you'd like

I would like the resulting dataset to only include the variable which has the requested variable_standard_name

Describe alternatives you've considered

Ignore it, usign the cf standard name on the resulting dataset works fine, e.g.:

ds.cf["sea_surface_height_above_geoid"]

Additional context

This is mostly an issue of neatness.

As the dataset is still a dask object, and the source files need opening anyway, the performance benefit of only including the requested variables in the returned dataset is probably small.

I think this is an issue with intake-esm upstream ? ping @dougiesquire and @charles-turner-1

The text was updated successfully, but these errors were encountered:

dougiesquire · 2025-02-20T00:55:14Z

Yeah, we've talked about this before @anton-seaice. This is a "feature" of Intake-esm datastores, which keep track of a single column for the dataset variables, defined by the variable_column_name. Only searches on this column (which is "variable" for most of our datastores) will refine the dataset returned by to_dask().

This possibly wouldn't be too hard to change in Intake-ESM

charles-turner-1 · 2025-02-20T01:56:32Z

I'm also wondering if we might be able to use a DerivedVariableRegistry to do this on the fly - I'm not sure whether we might run into difficulties knowing all the different possible variants of variable_standard_name to open the dataset using.

But yeah, it could be a nice feature to add to intake-esm for sure.

anton-seaice · 2025-02-20T01:57:55Z

Oh and the same / similar bevause for variable_long_name would be good for consistency

anton-seaice added the enhancement New feature or request label Feb 20, 2025

github-project-automation bot added this to Model Evaluation & Diagnostics Feb 20, 2025

github-project-automation bot moved this to Backlog in Model Evaluation & Diagnostics Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching by standard_name doesn't refine the resulting dataset by standard_name #348

Searching by standard_name doesn't refine the resulting dataset by standard_name #348

anton-seaice commented Feb 20, 2025

dougiesquire commented Feb 20, 2025

charles-turner-1 commented Feb 20, 2025

anton-seaice commented Feb 20, 2025

Searching by standard_name doesn't refine the resulting dataset by standard_name #348

Searching by standard_name doesn't refine the resulting dataset by standard_name #348

Comments

anton-seaice commented Feb 20, 2025

Is your feature request related to a problem? Please describe.

Describe the feature you'd like

Describe alternatives you've considered

Additional context

dougiesquire commented Feb 20, 2025

charles-turner-1 commented Feb 20, 2025

anton-seaice commented Feb 20, 2025