ecmwf-ifs · samhatfield · Jul 8, 2024 · Jul 8, 2024 · Jul 9, 2024 · Jul 9, 2024
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,5 @@ build/*
 install/*
 env.sh
 *.DS_Store
-
+docs/site
+docs/content_processed
diff --git a/docs/content/api.md b/docs/content/api.md
@@ -0,0 +1,41 @@
+---
+title: ecTrans API
+---
+
+@warning
+Page under construction.
+@endwarning
+
+## General notes
+
+@note
+ecTrans is a legacy code with an accumulated 30 years of history. Over this time certain
+features enabled through optional arguments will have fallen out of use. We are currently reviewing
+all options to identify those that can be safely deleted, but this takes time. In the mean time, we
+have tagged below all options we deem to be "potentially deprecatable".
+@endnote
+
+### Variable names
+
+ecTrans _in principle_ follows the coding standard and conventions outlined in the [IFS
+Documentation - Part VI: Technical and Computational Procedures](https://www.ecmwf.int/en/elibrary/
+81372-ifs-documentation-cy48r1-part-vi-technical-and-computational-procedures) section 1.5.
+Following these standards, all variable names must begin with a one- or two-character prefix
+denoting their scope (module level, dummy argument, local variables, loop index, or parameter) and
+type. These are outlined in Table 1.2. Dummy variables have the following prefixes:
+
+- `K` - integer
+- `P` - real (single or double precision)
+- `LD` - logical
+- `CD` - character
+- `YD` - derived type
+
+### `KIND` parameters
+
+As with the IFS, integer and real variables in ecTrans always have an explicit `KIND` specification.
+These are defined in the [`PARKIND1` module](https://github.com/ecmwf-ifs/fiat/blob/main/src/parkind
+/parkind1.F90) which is part of the Fiat library (a dependency of ecTrans). To understand the
+subroutines described here, only two must be considered:
+
+- `INTEGER, PARAMETER :: JPIM = SELECTED_INT_KIND(9)` (i.e. 4-byte integer)
+- `INTEGER, PARAMETER :: JPRD = SELECTED_REAL_KIND(13,300)` (i.e. 8-byte float)
diff --git a/docs/content/benchmarking.md b/docs/content/benchmarking.md
@@ -0,0 +1,153 @@
+---
+title: Benchmarking ecTrans
+---
+
+@warning
+Page under construction.
+@endwarning
+
+A ["benchmark driver" program](https://sites.ecmwf.int/docs/ectrans/sourcefile/ectrans-benchmark.
+f90.html) is bundled with ecTrans. This program performs a loop of inverse and
+direct spectral transforms over and over a specified number of times and collects timing statistics
+to provide an assessment of the overall performance of ecTrans. It is designed to mimic the use of
+ecTrans from within the IFS atmospheric model, in which inverse and direct spectral transforms are
+carried out on every model timestep. The benchmark program also includes a simple error checking
+algorithm for verifying that the transforms are performing with correct numerics. This latter
+feature is in fact used for the ecTrans CTest suite.
+
+Here we describe how to write a benchmark suite for ecTrans.
+
+## Installing ecTrans
+
+First follow the [instructions for installing ecTrans](installation.html) on your system. Verify
+that the benchmark programs (one for single and double precision) exist in your build's bin
+directory. You should see
+
+```bash
+ectrans-benchmark-cpu-sp  ectrans-benchmark-cpu-dp
+```
+
+Here we assume you have only enabled the `CPU` feature of ecTrans (which is on by default). If you
+also enabled the `GPU` feature, you'll also see GPU versions of these two programs. We'll just focus
+on CPUs here.
+
+In this guide we will only benchmark the single-precision build of ecTrans.
+
+## Using the benchmark program
+
+The benchmark program has many arguments for running ecTrans in different configurations. You can
+see the full set by running one of the benchmark programs with the `--help` option:
+
+```text
+NAME    ectrans-benchmark-cpu-sp
+
+DESCRIPTION
+        This program tests ecTrans by transforming fields back and forth between spectral
+        space and grid-point space (single-precision version)
+
+USAGE
+        ectrans-benchmark-cpu-sp [options]
+
+OPTIONS
+    -h, --help          Print this message
+    -v                  Run with verbose output
+    -t, --truncation T  Run with this triangular spectral truncation (default = 79)
+    -g, --grid GRID     Run with this grid. Possible values: O<N>, F<N>
+                        If not specified, O<N> is used with N=truncation+1 (cubic relation)
+    -n, --niter NITER   Run for this many inverse/direct transform iterations (default = 10)
+    -f, --nfld NFLD     Number of scalar fields (default = 1)
+    -l, --nlev NLEV     Number of vertical levels (default = 1)
+    --vordiv            Also transform vorticity-divergence to wind
+    --scders            Compute scalar derivatives (default off)
+    --uvders            Compute uv East-West derivatives (default off). Only when also --vordiv is given
+    --flt               Run with fast Legendre transforms (default off)
+    --nproma NPROMA     Run with NPROMA (default no blocking: NPROMA=ngptot)
+    --norms             Calculate and print spectral norms of transformed fields
+                        The computation of spectral norms will skew overall timings
+    --meminfo           Show diagnostic information from FIAT's ec_meminfo subroutine on memory usage, thread-binding etc.
+    --nprtrv            Size of V set in spectral decomposition
+    --nprtrw            Size of W set in spectral decomposition
+    -c, --check VALUE   The multiplier of the machine epsilon used as a tolerance for correctness checking
+
+DEBUGGING
+    --dump-values       Output gridpoint fields in unformatted binary file
+```
+
+Some of these options (e.g. `-nprtrv`) require a detailed understanding of how fields are
+distributed across MPI tasks, so we won't describe them in detail here. The most important arguments
+are the following:
+
+- `-t, --truncation T`: this sets the overall resolution of the benchmark. The truncation T refers  
+  to the highest zonal and total wavenumber that can be kept in spectral space. By default, a  
+  suitable grid point resolution (i.e. a suitable number of latitudes on the octahedral grid) will  
+  be chosen for spectral space. This single number then determines the overall problem size of the  
+  spectral transform.  The higher this number, the larger the problem size. As of August 2024, the  
+  "HRES" (high-resolution, deterministic) forecast of ECMWF uses a spectral truncation of 1279,  
+  combined with an octahedral grid of 2560 latitudes, which gives a grid point resolution of  
+  approximately 8 km.
+- `-n, --niter NITER`: this determines how many iterations to perform in the spectral transform.  
+  The more interations you perform, the more reliable the timing statistics you gather. Note that  
+  two additional iterations are always performed at the start. This is because (at least for the  
+  GPU version of ecTrans) the first two iterations include some initialisation costs which  
+  shouldn't be included in any timing statistics.
+- `-l, --nlev NLEV`: this determines the number of vertical levels for three-dimensional fields  
+  such as U and V wind (or vorticity and divergence). ecTrans can operate on a batch of vertical  
+  levels with a single call and this determines the size of this batch (though by default, fields  
+  are distributed across MPI tasks on the vertical dimension at some stages in the spectral  
+  transform)
+- `--vordiv --scders --uvders`: these options enable some auxiliary code paths when calling the  
+  inverse transform. `--vordiv` calculates grid point vorticity and divergence, `--scders`  
+  calculates derivatives of scalar fields in grid point space, and `--uvders` calculates gradients  
+  of the U and V wind in grid point space. For testing code changes, it's good to include these  
+  options so as many code paths as possible are verified.
+- `--norms`: this option enables error norms, which are printed aggregated over all fields at the  
+  end of the benchmark. The errors are computed in spectral space with respect to the initial  
+  values of the fields. This is useful to get a good idea that the benchmark is numerically  
+  correct.
+- `--meminfo`: this option enables so-called "meminfo" diagnostics, which are printed at the end of
+  the benchmark. These diagnostics include memory usage on a per-task basis. Thread binding
+  information is also printed which can be extremely useful when debugging performance issues when
+  running ecTrans multithreaded.
+
+Putting all of these together, we get the following invocation of the ecTrans benchmark program:
+
+```
+./ectrans-benchmark-cpu-sp --truncation 159 --niter 100 --nlev 137 --vordiv --scders --uvders --norms --meminfo
+```
+
+## Choosing a resolution
+
+As mentioned above, the single more important parameter for determining the problem size, and
+therefore computational cost, of a benchmark of ecTrans is the spectral truncation, controlled with
+the `-t, --truncation` argument. The computational complexity of an inverse or direct transform is
+essentially dominated by the matrix-matrix multiplications underpinning the Legendre transform. The
+floating-point operation count (FLOP) of this operation scales with the cube of the spectral
+truncation. That means that the computational work (measured in FLOPs) when the truncation is, e.g.,
+319 is about eight times higher than when it is 79.
+
+The following suite is an example of a weak scaling benchmark suite for ECMWF's HPC2020 system which
+is constructed by ensuring the work per task is approximately constant, following the cubic
+complexity argument above:
+
+- Truncation = 159 on 1 tasks (1 node)
+- Truncation = 319 on 8 tasks (1 node)
+- Truncation = 639 on 64 tasks (8 nodes)
+- Truncation = 1279 on 512 tasks (64 nodes)
+- Truncation = 2559 on 4096 tasks (512 nodes)
+- Truncation = 3999 on 15625 tasks (1953 nodes)
+
+Here we are allocating 16 of each node's 128 cores per task for multithreading (with no
+hyperthreading).
+
+
+
+<!-- When inspecting the program, you will notice that it is significantly more complex than, say, the
+example program described in our [usage guide](usage.html). This additional complexity comes not
+just from the instrumentation code for the timings, but notably also from the infrastructure to
+permit transforms of distributed fields. As explained in the [Design](design.html) chapter,
+ecTrans can operate on fields distributed across MPI tasks, and the dimension across which fields
+are split is different for spectral space and grid point space. As such, the benchmark program
+includes infrastructure for specifying which elements of the relevant decomposed dimension belong
+to which MPI task. -->
+
+
diff --git a/docs/content/design.md b/docs/content/design.md
@@ -0,0 +1,34 @@
+---
+title: Design
+---
+
+@warning
+Page under construction.
+@endwarning
+
+## The spectral transform
+
+ecTrans transforms a batch of meteorological fields from a grid point space representation
+\( X_k(\lambda_i, \phi_j) \), where \( \lambda_i \) is the \( i^{\text{th}} \) longitude,
+\( \phi_j \) is the \( j^{\text{th}} \) latitude, and \( k \) is the index which ranges over the
+batch of fields, to a spectral space representation \( X_{m,n,k} \), where \( m \) is the zonal
+wavenumber, and \( n \) is the total wavenumber. This constitutes a direct spectral transform.
+ecTrans can also carry out the inverse spectral transform.
+
+Beginning with the inverse spectral transform (spectral space to grid point space), this is
+accomplished in two computational steps. Firstly, an inverse Legendre transform is performed in the
+latitudinal direction,
+
+\[
+X_{m,k}(\phi_j) = \sum_{n=|m|}^{N} X_{m,n,k} P_{m,n}(\sin(\phi_j)).
+\]
+
+Then, an inverse Fourier transform is performed in the longitudinal direction,
+
+\[
+X_k(\lambda_i, \phi_j) = \sum_{n=-N}^{N} X_{m,k}(\phi_j) e^{im\lambda_i}.
+\]
+
+## Parallelizing a spectral transform
+
+## Basic usage of ecTrans
diff --git a/docs/content/gpu.md b/docs/content/gpu.md
@@ -0,0 +1,7 @@
+---
+title: GPU offloading
+---
+
+@warning
+Page under construction.
+@endwarning
diff --git a/docs/content/img/spherical_harmonic.png b/docs/content/img/spherical_harmonic.png
diff --git a/docs/content/index.md b/docs/content/index.md
@@ -0,0 +1,44 @@
+---
+title: ecTrans User Guide
+ordered_subpage: design.md
+ordered_subpage: installation.md
+ordered_subpage: usage.md
+ordered_subpage: benchmarking.md
+ordered_subpage: api.md
+ordered_subpage: transi.md
+ordered_subpage: license.md
+copy_subdir: img
+---
+
+ecTrans is a high-performance numerical library for transforming meteorological fields between
+global grid-point space representation and a spectral representation based on spherical harmonics.
+It is a fundamental part of the
+[European Centre for Medium-Range Weather Forecasts'](https://www.ecmwf.int/)
+[Integrated Forecasting System (IFS)](https://www.ecmwf.int/en/forecasts/documentation-and-support/
+changes-ecmwf-model), a global numerical weather prediction suite. Indeed, ecTrans
+was previously part of the IFS source code itself. It therefore benefits from over 30 years of
+development and optimisation. In 2022, ecTrans was split out from the IFS source code and released
+as the first open-source component of the IFS as its own project.
+
+ecTrans is engineered to work efficiently running on many hundreds, or even thousands, of compute
+nodes. This is achieved through a significant optimisation of the constituent compute kernels,
+making use of the FFTW library for the Fourier transform in the longitudinal direction and BLAS
+GEMMs for the Legendre transforms in the latitudinal direction. However, given that transformed
+fields are distributed across compute tasks, great care has been taken to ensure that parallelism
+can be fully exploited at all stages in the algorithm. This is achieved through data exchange steps
+interleaved between the Fourier and Legendre transforms, which are implemented using the Message
+Passing Interface (MPI).
+
+The result is an algorithm which stresses a high-performance computing system both on the
+node level _and_ the network level, serving as an excellent overall benchmark and a target for
+optimisation of IFS execution speed.
+
+This user guide contains the following sections:
+
+- [Design](design.html)
+- [Installation](installation.html)
+- [Usage](usage.html)
+- [Benchmarking](benchmarking.html)
+- [API](api.html)
+- [transi](transi.html)
+- [License](license.html)