Skip to content

Commit b2f4d51

Browse files
authored
Merge pull request #2 from coecms/dbcommand
Dbcommand
2 parents aa6cd85 + 83f6767 commit b2f4d51

31 files changed

+978
-237
lines changed

README.md

-104
This file was deleted.

README.rst

+147
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
|DOI| \_\_\_
2+
3+
ERA5
4+
====
5+
6+
The era5 python code is an interface to the CDS api to download ERA5
7+
data from the CDS data server. It uses a modified version of the CDS api
8+
which stops after a request has been submitted and executed. The target
9+
download url is saved and downloads are run in parallel by the code
10+
using the Pool multiprocessing module. As well as managing the downloads
11+
the code gets all the necessary information on available variables from
12+
local json configuration files. Before submitting a request the code
13+
will check that the file is not already available locally by quering a
14+
sqlite database. After downloading new files it is important to update
15+
the database to avoid downloading twice the same file. Files are first
16+
downloaded in a staging area, a quick qc to see if the file is a valid
17+
netcdf file is run and finally the file is converted to netcdf4 format
18+
with internal compression. The code default behaviour is to download
19+
netcdf files, but it's also possible to download grib, however we
20+
haven't tested the full workflow for this option.
21+
22+
Getting started
23+
---------------
24+
25+
To run a download::
26+
27+
era5 download -s surface -y 2018 -m 11 -p 228.128
28+
29+
Download ERA5 variables, if month argument is not passed then the entire year will be downloaded. By default it downloads hourly data in netcdf format.
30+
31+
Options:
32+
* -q, --queue Create json file to add request to queue
33+
* -u, --urgent In conjunction with queue save the request file in a separate directory that can be then prioritised.
34+
* -s, --stream [surface|wave|pressure|land|cems_fire|agera5|wdfe5] ECMWF stream currently operative analysis surface, pressure levels, wave model, and derived products ERA5 land, CEMS_fire, AGERA5 and WDFE5
35+
[required]
36+
* -y, --year TEXT year to download [required]
37+
* -m, --month TEXT month/s to download, if not specified all months for year will be downloaded
38+
* -t, --timestep [mon|hr|day] timestep if not specified hr is default
39+
* -b, --back Request backwards all years and months as one file, works only for monthly data
40+
* --format [grib|netcdf|tgz|zip] Format output: netcdf default, some formats work only for certain streams
41+
* -p, --param TEXT Grib code parameter for selected variable, pass as param.table i.e. 132.128 If none passed then allthe params listed in era5_<stream>_<tstep>.json will be used.
42+
* --help
43+
Show this message and exit.
44+
45+
To update files when a new month is released, omit param flag::
46+
47+
era5 download -s surface -y 2019 -m 05
48+
49+
50+
The 'download' sub command will actually request and download the data
51+
unless you use the 'queue' flag. If you want only to create a request::
52+
53+
era5 download -s surface -y 2018 -m 11 -p 228.128 -q
54+
55+
This will create a json file which stores the arguments passed::
56+
57+
era5_request_<timestamp>.json
58+
{"update": false, "format": "netcdf", "stream": "surface", "params": ["228.128"],
59+
"year": "2018", "months": ["11"], "timestep": "hr", "back": false}
60+
61+
To execute the request the tool is used with the 'scan' command option::
62+
63+
era5 scan -f era5_request_<timestamp>.json
64+
65+
To manage the database use 'era5 db' subcommand::
66+
67+
era5 db -s surface -p u10
68+
69+
for example will add all new surface hourly (tstep is hr by default) u10 files in database. NB that in this case you pass the variable name not the grib code to the -p flag.
70+
'db' can also list all variables in a stream::
71+
72+
era5 db -a list -s land -t mon
73+
74+
Unless a variable is specified this will show all variables regularly updated for the stream, how many corrsponding files are on the filesystem and in the databse. These numbers are compared to the expected number of files based on the current date.
75+
76+
Finally to delete records::
77+
78+
era5 db -a delete -s land -t mon -p ro -y 2019
79+
80+
This will delete all corresponding records but will list all records to be deleted and ask for confirmation first.
81+
82+
83+
Latest updates
84+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85+
August 2020 - Added --urgent flag
86+
Added support to distribute requests across multiple users accounts
87+
88+
click era5 command code and input files
89+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90+
91+
- cli.py -- main code, includes click commands and functions to buil
92+
request and submit it with pool python cli.py --help to check usage
93+
- era5_functions.py -- functions used by cli.py, just separated them
94+
so it is overall more readable
95+
- era5_db.py -- has all the fuctions relating to db operations
96+
- update_json_vars.py -- script that uses the clef db to update
97+
era5_vars.json needs clef module
98+
- setup.py and setup.cfg - to install module
99+
100+
To configure the tool
101+
~~~~~~~~~~~~~~~~~~~~~
102+
103+
These files are in era5/data/
104+
105+
- config.json -- to set configuration:
106+
* staging, data, logs directories,
107+
* database name and location,
108+
* bash commands to download, resume download, qc, compress and concatenate files,
109+
* number of threads,
110+
* number of resume download attempts,
111+
* slow and fast ips
112+
113+
- era5_pressure_hr.json -- pressure levels stream arguments to build
114+
request and list of params to download at hourly temporal resolution
115+
- era5_pressure_mon.json -- pressure levels stream arguments to build
116+
request and list of params to download at monthly temporal resolution
117+
- era5_wave_hr.json -- wave model surface level stream arguments to
118+
build request and list of params to download at hourly temporal
119+
resolution
120+
- era5_wave_mon.json -- wave model surface level stream arguments to
121+
build request and list of params to download at monthly temporal
122+
resolution
123+
- era5_surface_hr.json -- surface level stream arguments to build
124+
request and list of params to download at hourly temporal resolution
125+
- era5_surface_mon.json -- surface level stream arguments to build
126+
request and list of params to download at monthly temporal resolution
127+
- era5_land_hr.json -- Land model surface level stream arguments to
128+
build request and list of params to download at hourly temporal
129+
resolution
130+
- era5_vars.json -- Json file with list of grib codes that can be
131+
downloaded from CDS and respective variable and cds names
132+
- era5_derived.json -- Json file with list of derived products variables
133+
134+
Other files
135+
~~~~~~~~~~~
136+
137+
(not included in git)
138+
139+
- era5.sqlite -- sqlite database
140+
141+
Modified cdsapi code
142+
~~~~~~~~~~~~~~~~~~~~
143+
144+
- cdsapi: **init**.py **pycache** api.py
145+
146+
.. |DOI| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3549078.svg
147+
:target: https://doi.org/10.5281/zenodo.3549078

docs/gettingstarted.rst

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
Getting Started
2+
===============
3+
4+
ERA5 is an interface to the CDSapi (Copernicus Climate Data Store API). It helps automating and submitting csapi requests. ERA5 is python3 based and uses the click module so it can be run as command line.
5+
The configuration of the tool and specific information on the variables is provided by json files.
6+
7+
Once is installed the tool is accessed through the command-line `era5` program. There are
8+
presently three subcommands:
9+
10+
* :code:`clef download` to submit a request and download new files, at least one stream has to be specified. If you omit the variable then all the variables listed in the corresponding stream json file will be downloaded. this is useful for regular updates.
11+
12+
* :code:`clef scan` to execute a download request starting from a json constraints file previously saved using download with the --queue/-q flag.
13+
14+
Examples
15+
--------
16+
17+
18+
download
19+
+++++
20+
::
21+
22+
$ era5 download -s pressure -y 2020 -m 02 -p 132.128
23+
24+
This will put in a request for temperature (whose grib code is 132.128) for February 2020 on pressure levels and at 1hr timesteps.
25+
The area and grid resolution are defined in the
26+
era5/data/era5_pressure_hr.json file
27+
28+
NB that you can change the timestep by using the `-t` flag, `hr` is the default value, other possibilities are `mon` and `day` . Depending on the stream some of these options might not be valid.
29+
30+
Other flags are:
31+
* -h/--help
32+
* -q/--queue which saves the constraints to a json file that can be passed later to the `scan` subcommand
33+
* -f/--file works only with `scan`
34+
* --back works only with monthly data, it allows to download backwards all available data to the year passed with `-y`.
35+
36+
Scan example::
37+
$ era5 scan -f era5_request_file.json
38+
39+
Update the database::
40+
41+
$ era5 db -s land -t mon -p ro
42+
43+
NB
44+
* -a/--action works only for db subcommand to choose what db action to execute: update (default), list, or delete
45+
46+
47+
Installation
48+
============
49+
The tool uses python >=3.6 and you need to install the click package.
50+
51+
To setup the tool::
52+
$ python setup.py install
53+
54+
The github repository includes a modified version of the cdsapi.
55+
For this to work you need to setup a .csdapirc file in your home directory with the api url and your own api key. It should look like::
56+
57+
url: https://cds.climate.copernicus.eu/api/v2
58+
key: #########################::
59+
60+
61+
The instructions are available from the ECMWF confluence.
62+
NB most of the tool functionalities ar epreserved even if you use the original api. What the modified api does is to stop the api working before it can download the files. Separating the file download from the requests helps managing the downloads as parallel tasks and avoid potential sisue should the connection being interrupted.
63+
64+
Step2
65+
-----
66+
Configure the tool by defining the following settings in era5/data/config.json:
67+
68+
The following folders:
69+
* log for log outputs
70+
* staging where the files are initially downloaded
71+
* netcdf for the compressed files
72+
* era5_derived if you are downloading ERA% derived products
73+
* And update directories paths and the sqlite db name
74+
* the command you want to use to download/compress the files
75+

era5/__init__.py

Whitespace-only changes.
File renamed without changes.

cdsapi/api.py era5/cdsapi/api.py

File renamed without changes.

0 commit comments

Comments
 (0)