Skip to content

Rename filter not consistent with rename-action in open_dataset #554

@mpvginde

Description

@mpvginde

Describe the bug
Let's say I want to build a dataset with 6h- and 3h accumulated precipitation

# 3h accumulations
       - dates:
            start: 2020-01-01 00:00:00
            end: 2021-01-01 00:00:00
            frequency: 3h
          accumulations:
            <<: *mars_request
            time: [0]
            accumulation_period: [0, 3]
            param:
            - tp    # total precipitation
# rename to allow 2 total precipitation fields    
      - rename:
          param: "{param}_3h"
# 6h accumulations
       - dates:
            start: 2020-01-01 00:00:00
            end: 2021-01-01 00:00:00
            frequency: 3h
          accumulations:
            <<: *mars_request
            time: [0]
            accumulation_period: [0, 6]
            param:
            - tp    # total precipitation
# rename to allow 2 total precipitation fields
      - rename:
          param: "{param}_6h"

The rename filter in the recipe updates the param key in the metadata, which I guess is then used as the variable name in the dataset.

Image

As far as I know there is currently no way of renaming a single variable (during dataset creation) without also changing the param metadata.

When changing the variable name structure of all variables with remapping, this is possible

output:
  remapping:
    param_level: "{param}_{levelist}"

I use the above snippet to get rid of the _2 or _10 in the name of surface levels fields like 2t_2 or 10v_10.
Here only the variable name is changed.

Now I want to combine this dataset with another dataset using the cutout functionality

Image Image Image

Now the combined dataset has variable name tp with param metadata tp_6h.
This (currently) trips up the scalers during training.

** Version number **
building of cerra
anemoi-datasets branch abstracting-accumulation (8fb0a16)

opening datasets with cutout:
anemoi-datasets: current main (c26a9d8)
anemoi-transform: current main (ecmwf/anemoi-transform@548e2fa)

Additional context
I see two possible solutions:

  1. Add functionality to specify the name of a variable during building without changing the param metadata
  2. Let the rename action in open_dataset also rename the param metadata, but this might lead to other problems with pressure level fields as these typically have name: t_600, param: t

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To be triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions