Skip to content

Commit 708c816

Browse files
b8raoultfloriankrb
andauthored
feat: new data sources (#258)
Co-authored-by: Florian Pinault <[email protected]>
1 parent 50c4b67 commit 708c816

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1581
-170
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,3 +135,8 @@ _version.py
135135
*.to_upload
136136
tempCodeRunnerFile.python
137137
Untitled-*.py
138+
*.zip
139+
*.json
140+
*.db
141+
*.tgz
142+
_api/

.readthedocs.yaml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ build:
44
os: ubuntu-22.04
55
tools:
66
python: "3.11"
7-
jobs:
8-
pre_build:
9-
- bash docs/scripts/api_build.sh
7+
# jobs:
8+
# pre_build:
9+
# - bash docs/scripts/api_build.sh
1010

1111
sphinx:
1212
configuration: docs/conf.py
1313

1414
python:
15-
install:
16-
- method: pip
17-
path: .
18-
extra_requirements:
19-
- docs
15+
install:
16+
- method: pip
17+
path: .
18+
extra_requirements:
19+
- docs

03-constant-fields.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
########################
2+
Adding constant fields
3+
########################

docs/cli/grib-index.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.. _grib-index_command:
2+
3+
Grib-index Command
4+
============
5+
6+
The `grib-index` command is used to create an index file for GRIB files. The index file is then used
7+
by the `grib-index` :ref:`source <grib-index_source>`.
8+
9+
The command will recursively scan the directories provided and open all the GRIB files found. It will
10+
then create an index file for each GRIB file, which will be used to read the data.
11+
12+
.. code:: bash
13+
14+
anemoi-datasets grib-index --index index.db /path1/to/grib/files /path2/to/grib/files
15+
16+
17+
See :ref:`grib_flavour` for more information about GRIB flavours.
18+
19+
20+
.. argparse::
21+
:module: anemoi.datasets.__main__
22+
:func: create_parser
23+
:prog: anemoi-datasets
24+
:path: grib-index
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67
# Must load an orography variable
78

8-
- orog_to_z:
9-
orog: orog # Name of orography (input) variable
10-
z: z # Name of z (output) variable
9+
- orog_to_z:
10+
orog: orog # Name of orography (input) variable
11+
z: z # Name of z (output) variable
Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67

7-
- regrid:
8-
method: nearest
9-
in_grid: o32
10-
out_grid: o48
8+
- regrid:
9+
method: nearest
10+
in_grid: o32
11+
out_grid: o48
Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67

7-
- regrid:
8-
matrix: /path/to/regrid/matrix.npz
8+
- regrid:
9+
matrix: /path/to/regrid/matrix.npz
Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67

7-
- rename:
8-
param: # Map old `param` names to new ones
9-
temperature_2m: 2t
10-
temperature_850hPa: t_850
11-
# ...
8+
- rename:
9+
param:
10+
# Map old `param` names to new ones
11+
temperature_2m: 2t
12+
temperature_850hPa: t_850
13+
# ...
Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67
# Must load the variables to be summed
78

8-
- sum:
9-
params: # List of input variables
10-
variable1
11-
variable2
12-
variable3
13-
output: variable_total # Name of output variable
9+
- sum:
10+
params:
11+
# List of input variables
12+
- variable1
13+
- variable2
14+
- variable3
15+
output: variable_total # Name of output variable
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
input:
22
pipe:
3-
- source: # mars, grib, netcdf, etc.
3+
- source:
4+
# mars, grib, netcdf, etc.
45
# source attributes here
56
# ...
67
# Must load geometric vertical velocity
78

8-
- wz_to_w:
9-
wz: wz # Name of geometric vertical velocity (input) variable
10-
x: z # Name of pressure vertical velocity (output) variable
9+
- wz_to_w:
10+
wz: wz # Name of geometric vertical velocity (input) variable
11+
x: z # Name of pressure vertical velocity (output) variable

docs/datasets/building/sources.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,19 @@ The following `sources` are currently available:
2121
:maxdepth: 1
2222

2323
sources/accumulations
24-
sources/eccc_fstd
24+
sources/anemoi-dataset
25+
sources/cds
26+
sources/eccc-fstd
2527
sources/forcings
2628
sources/grib
29+
sources/grib-index
2730
sources/hindcasts
2831
sources/mars
29-
sources/cds
3032
sources/netcdf
3133
sources/opendap
3234
sources/recentre
33-
sources/repeated_dates
35+
sources/repeated-dates
36+
sources/xarray-based
3437
sources/xarray-kerchunk
3538
sources/xarray-zarr
36-
sources/xarray-based
3739
sources/zenodo
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
.. _anemoi-dataset_source:
2+
3+
################
4+
anemoi-dataset
5+
################
6+
7+
.. admonition:: Experimental
8+
:class: important
9+
10+
This source is experimental and may change in the future.
11+
12+
An anemoi-dataset can be a source for a dataset:
13+
14+
.. literalinclude:: yaml/anemoi-dataset.yaml
15+
:language: yaml
16+
17+
The parameters are the same as the ones used in the ``open_dataset``
18+
function, which allows you to subset and combine datasets. See
19+
:ref:`opening-datasets` for more information.

docs/datasets/building/sources/eccc_fstd.rst renamed to docs/datasets/building/sources/eccc-fstd.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,14 @@
55
To read files in the standard format used at Environment and Climate
66
Change Canada (ECCC), the following source can be used:
77

8-
.. literalinclude:: yaml/eccc_fstd.yaml
8+
.. literalinclude:: yaml/eccc-fstd.yaml
99
:language: yaml
1010

1111
The recipe will build a dataset from a standard file using the
1212
``fstd2nc`` xarray plugin.
1313

14-
The ``fstd2nc`` dependencie is not part of the default anemoi-datasets
14+
The ``fstd2nc`` dependency is not part of the default anemoi-datasets
1515
installation and has to be install following the `fstd2nc project
1616
description <https://github.com/neishm/fstd2nc>`_.
17+
18+
See :ref:`create-cf-data` for more information.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. _grib-index_source:
2+
3+
############
4+
grib-index
5+
############
6+
7+
The `grib-index` source is used to read GRIB files with the help of an
8+
index file created with the `grib-index` :ref:`command
9+
<grib-index_command>`.
10+
11+
See :ref:`create-grib-data` for more information.

docs/datasets/building/sources/grib.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _grib_source:
2+
13
######
24
grib
35
######
@@ -32,6 +34,8 @@ hour, you can use the following configuration:
3234
The patterns in between the curly brackets are replaced by the values of
3335
the `date` and formatted according to the Python strftime_ method.
3436

37+
See :ref:`create-grib-data` for more information.
38+
3539
.. note::
3640

3741
You can combine all the above options when selecting GRIB messages

docs/datasets/building/sources/netcdf.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,5 @@ files, using Unix’ shell `wildcards
2424
information, but in some cases it is not possible. If you encounter
2525
this or similar issues, please open an issue in the anemoi-datasets
2626
repository.
27+
28+
See :ref:`create-cf-data` for more information.

docs/datasets/building/sources/opendap.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44

55
.. literalinclude:: yaml/opendap.yaml
66
:language: yaml
7+
8+
See :ref:`create-cf-data` for more information.
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
################
2-
repeated_dates
2+
repeated-dates
33
################
44

5-
The `repeated_dates` source is used to repeat a single source multiple
5+
The `repeated-dates` source is used to repeat a single source multiple
66
times, so that its data is present at multiple dates. A simple example
7-
of this is when you have source that contains a constant field, such as
8-
orography or bathymetry, that you want to have repeated at all the dates
9-
of the dataset.
7+
of this is when you have a source that contains a constant field, such
8+
as orography or bathymetry, that you want to have repeated at all the
9+
dates of the dataset.
1010

11-
The generale format of the `repeated_dates` source is:
11+
The general format of the `repeated-dates` source is:
1212

1313
.. literalinclude:: yaml/repeated_dates1.yaml
1414
:language: yaml
@@ -21,19 +21,19 @@ where ``source`` is any of the :ref:`operations <operations>` or
2121
constant
2222
**********
2323

24-
.. literalinclude:: yaml/repeated_dates2.yaml
24+
.. literalinclude:: yaml/repeated-dates2.yaml
2525
:language: yaml
2626

2727
*************
2828
climatology
2929
*************
3030

31-
.. literalinclude:: yaml/repeated_dates3.yaml
31+
.. literalinclude:: yaml/repeated-dates3.yaml
3232
:language: yaml
3333

3434
*********
3535
closest
3636
*********
3737

38-
.. literalinclude:: yaml/repeated_dates4.yaml
38+
.. literalinclude:: yaml/repeated-dates4.yaml
3939
:language: yaml

docs/datasets/building/sources/xarray-based.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,16 @@
22
xarray-based-sources
33
######################
44

5-
More in general, you can specify any valid xarray.open_dataset_
6-
arguments as the source and anemoi-dataset will try to build a dataset
7-
from it. Examples of valid xarray.open_dataset_ arguments are: netCDF,
8-
zarr, opendap, etc.
5+
More generally, you can specify any valid xarray.open_dataset_ arguments
6+
as the source, and anemoi-dataset will try to build a dataset from it.
7+
Examples of valid xarray.open_dataset_ arguments include: netCDF, Zarr,
8+
OpenDAP, etc.
99

1010
.. literalinclude:: yaml/xarray-based.yaml
1111
:language: yaml
1212

13+
See :ref:`create-cf-data` for more information.
14+
1315
.. _cf conventions: http://cfconventions.org/
1416

1517
.. _wildcards: https://en.wikipedia.org/wiki/Glob_(programming)

docs/datasets/building/sources/xarray-kerchunk.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ install the relevant packages before running the code below.
1212
.. literalinclude:: xarray-kerchunk.py
1313
:language: python
1414

15+
See :ref:`create-cf-data` for more information.
16+
1517
.. _era5 dataset available on aws: https://registry.opendata.aws/ecmwf-era5/
1618

1719
.. _kerchunk tutorial: https://fsspec.github.io/kerchunk/tutorial.html

docs/datasets/building/sources/xarray-zarr.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,18 @@
33
#############
44

55
Here is an example recipe that builds a dataset using one of the many
6-
regridded versions of ERA5 hosted by Google in Analysis-Ready, Cloud
7-
Optimized format. See `here
6+
regridded versions of ERA5 hosted by Google in an Analysis-Ready,
7+
Cloud-Optimised format. See `here
88
<https://cloud.google.com/storage/docs/public-datasets/era5>`_ for more
99
information.
1010

1111
.. literalinclude:: yaml/xarray-zarr.yaml
1212
:language: yaml
1313

14-
Note that unlike the ``mars`` examples, there is no need to include a
15-
``grid`` specification. Also, in order to subselect the vertical levels,
14+
Note that, unlike the ``mars`` examples, there is no need to include a
15+
``grid`` specification. Additionally, to subselect the vertical levels,
1616
it is necessary to use the :ref:`join <building-join>` operation to join
1717
separate lists containing 2D variables and 3D variables. If all vertical
18-
levels are desired, then it is OK to specify a single source.
18+
levels are desired, then it is acceptable to specify a single source.
19+
20+
See :ref:`create-cf-data` for more information.

docs/datasets/building/sources/yaml/accumulations1.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ input:
22
accumulations:
33
accumulation_period: 6
44
class: ea
5-
param: [tp, cp, sf]
5+
param: [ tp, cp, sf ]
66
levtype: sfc
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
input:
22
accumulations:
3-
accumulation_period: [6, 12]
3+
accumulation_period: [ 6, 12 ]
44
class: od
5-
param: [tp, cp, sf]
5+
param: [ tp, cp, sf ]
66
levtype: sfc
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
input:
2+
anemoi-dataset:
3+
join:
4+
- dataset: dataset1
5+
select: [ z_500, t_500, u_500, v_500 ]
6+
frequency: 6h
7+
- dataset: dataset2
8+
select: [ msl, 2t, 10u, 10v ]
9+
frequency: 6h
10+
start: 2000
11+
end: 2001

0 commit comments

Comments
 (0)