ATTRICI can be installed from a local checkout of the repository (see 'Development' section) or directly with pip from GitHub, e.g. for the main branch:
pip install git+https://github.com/isi-mip/attrici.git@mainATTRICI makes its main functionality available via a command line tool.
The current version can be displayed with
attrici --versionShow the command line options
attrici --helpThe available subcommands are:
derive-huss Derive specific humidity from relative humidity, air pressure, and temperature
detrend Detrend a dataset
merge-output Merge detrended output or trace files
postprocess-tas Derive tasmin and tasmax from tas, tasrange, and tasskew
preprocess-tas Derive tasrange and tasskew from tas, tasmin, and tasmax
ssa Perform singular spectrum analysis
``
For help on these sub-commands use e.g.
```bash
attrici detrend --help
Some test data is provided in tests/data to demonstrate a detrend run.
attrici detrend --gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date "2023-12-31" \
--report-variables y cfact logpSome cells can be omitted from the calculation with a mask via the --mask-file option, e.g. a land-sea-mask.
The example below uses a mask to use only three grid cells.
attrici detrend --gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--mask-file ./tests/data/20CRv3-ERA5_germany_mask.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date "2023-12-31" \
--report-variables y cfact logpTo change the logging level
the LOGURU_LEVEL environment variable can be set.
For example, to show messages with level WARNING and higher
LOGURU_LEVEL=WARNING attrici detrend \
--gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date "2023-12-31" \
--report-variables y cfact logpTo print the current config with the used defaults and command line options to standard out
as TOML the --print-config flag can be used.
attrici detrend --gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date "2023-12-31" \
--report-variables y cfact logp \
--print-configTOML output is shown below.
gmt_file = "tests/data/20CRv3-ERA5_germany_ssa_gmt.nc"
input_file = "tests/data/20CRv3-ERA5_germany_obs.nc"
variable = "tas"
output_dir = "tests/data/output"
gmt_variable = "tas"
modes = 4
bootstrap_sample_count = 0
overwrite = false
write_trace = false
fit_only = false
progressbar = false
report_variables = ["y", "cfact", "logp"]
seed = 0
solver = "pymc5"
stop_date = 2023-12-31
task_count = 1
task_id = 0
timeout = 3600This can be used to re-run a specific config
attrici detrend --config runconfig.tomlAs a computationally expensive operation, the detrend sub-command is designed to be run in parallel. As in ATTRICI
the modelling for one cell is independent of the others, the cells can be handled in parallel by different processes
without any not interfere with each other. Hence, this outputcan be merged later on (see merge-output
sub-command). To make use of communication between them. The output for each cell is stored in separate files, so
that processes for different cells do this parallelization, specify the arguments --task-id ID and --task-count COUNT and start several instances with ID going from 0 to COUNT-1. COUNT does not have to equal the number
of cells - these will be distributed to instances accordingly (ideally equally when the number of cells is a
multiple of COUNT).
For the SLURM scheduler, which is widely used on HPC platforms, you can use an sbatch run script such as the
following (here COUNT=4):
#!/usr/bin/env bash
#SBATCH --account=MYACCOUNT
#SBATCH --array=0-3
#SBATCH --cpus-per-task=2
#SBATCH --export=ALL,OMP_PROC_BIND=TRUE
#SBATCH --job-name="attrici"
#SBATCH --ntasks=1
#SBATCH --partition=standard
#SBATCH --qos=short
#SBATCH --time=01:00:00
# load necessary modules/packages here if you don't queue with them loaded
# e.g.: module purge; module load ...
# or: spack load ...
# load virtual environment if you don't queue with it activated:
# e.e.: source venv/bin/activate
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun attrici \
detrend \
--gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date 2021-12-31 \
--report-variables y cfact logp \
--overwrite \
--task-id "$SLURM_ARRAY_TASK_ID" \
--task-count "$SLURM_ARRAY_TASK_COUNT"If you prefer SLURM tasks rather than job arrays, an example scheduling script would look like:
#!/usr/bin/env bash
#SBATCH --account=MYACCOUNT
#SBATCH --cpus-per-task=2
#SBATCH --export=ALL,OMP_PROC_BIND=TRUE
#SBATCH --job-name="attrici"
#SBATCH --ntasks=4
#SBATCH --partition=standard
#SBATCH --qos=short
#SBATCH --time=01:00:00
# load necessary modules/packages here if you don't queue with them loaded
# e.g.: module purge; module load ...
# or: spack load ...
# load virtual environment if you don't queue with it activated:
# e.e.: source venv/bin/activate
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun bash <<'EOF'
exec attrici \
detrend \
--gmt-file ./tests/data/20CRv3-ERA5_germany_ssa_gmt.nc \
--input-file ./tests/data/20CRv3-ERA5_germany_obs.nc \
--output-dir ./tests/data/output \
--variable tas \
--stop-date 2021-12-31 \
--report-variables y cfact logp \
--overwrite \
--task-id "$SLURM_PROCID" \
--task-count "$SLURM_NTASKS"
EOFSLURM tasks are counted as on large job which needs all resources at once. SLURM arrays, on the other hand, defined smaller jobs that run independently. As there is no communication between tasks needed in the case of ATTRICI, the array approach is likely more suitable.
Both scripts assume that you schedule them from a setup suitable to run ATTRICI, i.e. with a virtual environment activated being able to run ATTRICI locally. Otherwise, adjust the scripts to setup that environment as given in the respective comment in the script.