Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized facet name requires adding the facet manually to selection.py #3

Open
AtefBN opened this issue Apr 19, 2023 · 2 comments

Comments

@AtefBN
Copy link
Collaborator

AtefBN commented Apr 19, 2023

[2023-04-19 17:28:02]  DEBUG     root
Locals:
{
    'self': Selection(
        driving_model='MOHC-HadGEM2-ES',
        ensemble='r1i1p1',
        experiment='rcp26',
        project='CORDEX'
    ),
    'name': 'rcm_version',
    'value': ['v2']
}


[2023-04-19 17:28:02]  ERROR     root

Traceback (most recent call last):
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/tui.py", line 154, in logging
    yield
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/search.py", line 69, in search
    query = parse_query(
            ^^^^^^^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/utils.py", line 175, in parse_query
    selection = parse_facets(facets)
                ^^^^^^^^^^^^^^^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/cli/utils.py", line 155, in parse_facets
    selection[name] = values
    ~~~~~~~~~^^^^^^
  File "/gpfscmip/gpfsdata/esgf/miniconda/envs/esgpull/lib/python3.11/site-packages/esgpull/models/selection.py", line 105, in __setitem__
    raise KeyError(name)
KeyError: 'rcm_version'

@JoranAngevaare
Copy link

I think I encountered a similar issue following the search page of the documentation.

Following the documentation, I wanted to query based on the version, the --hints command listed the available versions:

(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon --facets | tail -3
  "variant_label",
  "version"
]
(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon --hints version | tail
      "20210826": 7,
      "20211229": 3,
      "20220105": 1,
      "20220426": 1,
      "20220720": 2,
      "20220721": 6,
      "20220722": 606
    }
  }
]

Yet, building the query leads to a key error:

(py310) [angevaar@pc160101 joran]$ esgpull search project:CMIP6 variable_id:tas institution_id:IPSL frequency:mon version:20220722
KeyError: 'version'
See /data/ssd/joran/esgpull/log/esgpull-search-2023-05-04_07-17-47.log for error log.
Aborted!

Full traceback

(py310) [angevaar@pc160101 joran]$ less /data/ssd/joran/esgpull/log/esgpull-search-2023-05-04_07-17-47.log
[2023-05-04 09:17:47]  DEBUG     root
Locals:
{'self': Selection(frequency='mon', institution_id='IPSL', project='CMIP6', variable_id='tas'), 'name': 'version', 'value': ['20220722']}


[2023-05-04 09:17:47]  ERROR     root

Traceback (most recent call last):
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/tui.py", line 154, in logging
    yield
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/search.py", line 69, in search
    query = parse_query(
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/utils.py", line 175, in parse_query
    selection = parse_facets(facets)
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/cli/utils.py", line 155, in parse_facets
    selection[name] = values
  File "/usr/people/angevaar/miniconda3/envs/py310/lib/python3.10/site-packages/esgpull/models/selection.py", line 116, in __setitem__
    raise KeyError(name)
KeyError: 'version'

Build info

## Version
(py310) [angevaar@pc160101 joran]$ pip list | grep esg
esgpull              0.4.0
## Build method
conda install esgpull=0.4.0 --channel ipsl --channel conda-forge
## OS
(py310) [angevaar@pc160101 joran]$ cat /etc/os-release | head -2
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"

@svenrdz
Copy link
Collaborator

svenrdz commented Jun 21, 2023

Sorry, I just realized I forgot to enable notifications on this repo, since it was moved to the ESGF organization.

Currently, the list of facet keys that can be used in a Query is hard-coded in this file:
https://github.com/ESGF/esgf-download/blob/main/esgpull/models/selection.py#L158-L196

The --hints flag shows anything returned by the search API, and therefore is not directly linked to the list of "valid" facet keys.

For facet values, there is no such hard constraint. I relaxed it after realizing it prevented using some of esgpull's search features inside an saved Query (i.e. wildcard syntax).

As it seems a recurring issue, we should definitely improve this validation, and I can think of a few ways:

  • adding missing facets to the hard-coded list (such as version and rcm_version), although this is manual work, and ties the list of valid keys to a specific esgpull version
  • remove the constraint on keys and allow any user input, the main drawback I see is the impossibility to give feedback on whether a key is invalid or its associated values filter the list of datasets down to nothing
  • use the exhaustive list of facets fetched during esgpull self install, although this fetching step was removed in the latest version 0.5.4 since it was used nowhere.

In the meantime, there is a workaround that concerns version specifically, since versions are somehow related to publication date. By that I mean the publication date for non-replicas datasets should in theory be the same as the version of the dataset.
You can use the --from and --to parameters that filter on the publication date of a dataset, which take a date as value, following the format YYYY-MM-dd.
This currently only works inside the search command, but it's on my todo list to include those parameters in saved queries with the add command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants