Getting Started

Compiling

System Requirements

The following are required to compile and run Hybrid-GODAS

rocoto workflow manager
cmake
climate data operators (cdo)
NetCDF operators (nco)
NetCDF4 (with Fortran library)
openmpi or intelmpi
fortran compiler

Additionally, the following may be required for some of the preparation scripts (observation and forcing downloading and prep):

Python 3 with modules for NetCDF4, pygrib

Compiling on Gaea

The following instructions will enable you to download the code needed andcompile, assuming you are running on the Gaea supercomputer.

The following commands will download the hybrid-GODAS repository as well as any other child repositories that it requires:

git clone https://github.com/UMD-AOSC/hybrid-godas.git
cd hybrid-godas
git submodule update --init --recursive

Setup the environment by creating or linking a ./config/env file. For example, on Gaea use the already made configuration file by running:

ln -s env.gaea config/env

within the root directory, compile the MOM6 model and the data assimilation code. The various components (fms, mom, gsw, datetime, obsop, util, 3dvar, letkf) can be compiled separately, but for convenience use the following:

make model
make da

download or link the static files required for MOM6. On gaea these are already available and can be linked to with

ln -s /lustre/f1/pdata/gfdl_O/datasets src/MOM6/.datasets

There are several other preprocessed historical datasets (surface forcing, observations,...) needed by the system that can be linked to from my personal directory:

mkdir DATA
ln -s /lustre/f1/unswept/ncep/Travis.Sluka/hybrid-godas-DATA/* DATA/

Running a short cycling experiment

The following will get you running a short cycling experiment with data assimilation using the default settings. Note: the default settings use initial conditions from 2004-01-01 generated by a 1 year 20 ensemble member run. If you do not want to use these initial conditions, and instead want to start from Levitus T/S climatology, simply delete the DATA/exp1/cycle directory.

Initialize a new experiment directory by running from the root directory:

./run/init_cycle.sh DATA/exp1
cd DATA/exp1

This is the directory where all the experiment specific configuration, logs, and results are stored. You should see several files/directories of importance, which have been initialized with default configuration values:

config/ - All the configuration files needed for running the model (config/mom), doing data assimilation (config/da) as well as the master configuration file that controls everything (config/hybridgodas.config)
cycle/ all the files required for starting the next cycle are placed here. This includes the rocoto configuration files, the restart files from the previous cycle, the analysis increment from the previous data assimilation set.
hybridgodas.rocoto - A convenience wrapper for running any of the rocoto commands.
hybridgodas.run - The main run script used to cycle rocoto for the running of your experiment. This generates an xml file for rocoto based on your config/hybridgodas.config configuration file, and submits it to rocoto. Rocoto then submits it to the system's job management system.
hybridgodas.status - A convenience script to view the status of the rocoto cycles.
version - The version of the source code repositories as it was when you initialized your experiments.

Edit the config/hybridgodas.config file paying attention to the following:

SCHED_ACCT should be changed to the computing account you have access to on Gaea
CYCLE_END should be set to something shorter, say 2003010500 if you want to just run a single 5 day cycle. These dates are in YYYYMMDDHH format.
several directory locations were automatically filled out, make sure they are correct (ROOT_DIR,EXP_DIR,WORK_DIR, FORC_MEAN_FILE, etc)
ENS_SIZE is the number of ensemble members. Set this to something small, perhaps 5.
DA_MODE is the data assimilation mode the system will run in. Set this to "hyb" for a full hybrid-DA run.

Start the experiment. From the main experiment directory run:

./hybridgodas.run --cycle 2

This will cause several things to happen. 1) An xml file is generated and placed at cycle/rocoto/hybridgodas.rocoto.xml. The contents of this file depends on exactly how your experiment was configured. 2) rocotorun is called to submit jobs to the system's jobs manager. 3) this is repeated every 2 minutes, or whatever argument you pass after --cycle, until all the cycles up to CYCLE_END have finished.

I find calling ./hybridgodas.run --cycle 2 easy for running experiments, but it does require you to leave your window open (or have a window open in something like tmux or screen). Alternatively you could setup a cronjob to run hybridgodas.run every couple of minutes.

In a separate window you can run ./hybridgodas.status to look at the latest report of the experiments status.

Output

Assuming everything above went well, you'll see several new directories in your experiment directory.

log/ all logs from the job steps are places here
output/ all of the final output from the experiment is placed here. By default compression is done before saving the files here to dramatically shrink their file sizes. But this can be controlled in the hybridgodas.config file.

The exact contents of output/ depends on what type of data assimilation is run. But for a hybrid-DA run you'll see

ana/{mean, sprd} the mean and spread of the analysis at the end of the data assimilation cycle
ana/mean_letkf - the intermediate analysis mean after the LETKF step. The workflow is setup so that the LETKF is performed first, and then the 3DVar uses this analysis mean as its background before it produces the final analysis mean. This is here for diagnostic purposes only and is not what is actually used to restart the models.
bkg/{mean, sprd} the background ensemble mean and spread
bkg_diag/ The full diagnostics from the control forecast (initialized from the analysis mean). The previous mentioned ana and bkg files currently only save the state variables required by the data assimilation (U,V,T,S) but the diagnostic fields here contain everything output by the model. Currently this is the pentad average files.
omf/ the observation minus forecast (O-F) files that are used by the LETKF.

take note that all filenames that have dates are (for the most part) using the analysis time. So the 5 day pentad average files from Jan 1 - Jan 5 will be listed using 2003010600 since the very end of the period is the analysis time, 2003 Jan 6, 00Z.

Home

Getting Started

Technical Documentation

workflow

Provide feedback

Saved searches