December 6, 2023

Welcome to CUPiD Kickoff Meeting Notes!

Slide Deck

December 6, 2023

Who's in the room?

  • All the CGD sections, CESM, ESDS, ESMF, GeoCAT


  • Unified CESM diagnostics!

Goal #1: Bare-bones deployment

  • Minimal diagnostics generation for CESM3 development
  • Diagnostics output from all working groups
  • Feature-complete API: run inside CIME and outside CESM
  • Feature requests, github issues

Goal #2: CMIP PI run

  • Port NCL code (GeoCAT)
  • Time series generation
  • Data compression
  • Subset of CMORization
  • Climatology generation
  • Extensibility: Include outside packages

Goal #3: CMIP DECK experiments

  • CESM3 paper for PI control: publication quality figures, turns into paper introducing CESM 3 (PI control + historical)
  • Diagnostics for model forcing

Goal #4: High resolution / ensemble support

  • Initialization prediction, large ensembles

Project Organization

  • Zulip: #CESM-diagnostics for project discussion
  • Google group: announcements and meeting invites
  • Bi-weekly meetings starting in January 2024
  • Google drive
  • Github repository for code management
    • Examples using NBscuid
    • Projects containing the milestones/goals
    • Issues defining tasks for each project, especially for bare bones deployments

Demo: NBscuid / ADF merge [Lev]

  • NBscuid: infrastructure that's running, engine that runs a collection of notebooks or scripts, inputs being case details, etc.
  • Demo is running ADF and MOM6 diagnostics
  • config.yml file: where to get notebooks that are running, and what notebooks to run, Jupyter Book config
  • template notebooks live in a specified directory: engine makes a copy of the notebooks, sticks in parameters (e.g., case name), and runs them
  • computed_notebooks directory: engine created notebooks for ADF and surface (MOM6)
  • can also configure python scripts to run
  • sharing the notebooks uses Jupyter Book via html files, can then copy those to CGD machines for online browsing (similar to ADF)
  • possible to generate images only for html (no code)? Yes, could generate png images only and then load them into a separate Jupyter Book with some markdown text for explanation. Could also hide notebook cells in presentation of Jupyter Book.
  • what does ADF do? raw HTML
  • Would it be problematic to include all the code used to generate the diagnostics?

Next steps

  • Component-specific goals for Milestone #1
  • Familiarize code/framework
  • Proposed next meeting: Jan 10 - deep dive into code
  • What is the starting point for your component? For example, using the CESM tutorial diagnostics as the launching point. And/or existing NCL-based diagnostics packages.
  • Each component develops tools in github repo to process output
  • 102 years of coupled model output is available for testing (see slides)


  • Are we missing anything?
  • Does the vision make sense?
  • Does the proposed path to get there make sense?


  • Orhan: unstructured grids component of CESM (CAM), would bring Project Raijin into scope in addition to GeoCAT comp and viz. Project Raijin is a VAST/CISL project dedicated to unstructured grids; Python package UXarray; Orhan, Brian M. are co-PIs
  • Dave L.: Frequently want to compare more than 1 model case (ideally N cases); Jesse: ADF has that functionality
  • Dave B.: What does ADF do already? Timeseries generation is a key piece (compression, simplified output).
  • Dave B.: Had previously felt that the goal of diagnostics packages was not to create publication quality plots and instead diagnose the model output; Dave L.: publication quality can mean a good plot / easy to read, that would be helpful for model development not just publications
  • Do we point to CUPiD repo to support open data/code?
  • Anna: What is the scope here? Diagnosing a run like a quick overview with good plots, or something broader? Related to defining the starting point.
  • Jesse: For ADF, they decided on classes of plots (e.g., zonal means). Then also some specialized plots (e.g., QBO), but you can turn on/off.
  • Justin: To get started, each section can talk about what is important to them.
  • Jesse: Technical specs of this package? e.g., resolution of figures. Generation of plots is slow from ADF experience (can we make this faster?)
  • Katelyn: Saved intermediate processed data can help with going back to make publication quality figures
  • Dave B.: Saving netCDF files along the way - is this part of the process? Could also be in another format - some sort of intermediate file.
  • Brian D.: Separated out variables or spatial subsetting could help with I/O.
  • Kate: ADF already computes climatologies; the example showed a potentially repetitive process as in separate components creating intermediate files independently; there are several different ways to create climatologies - what's the best path forward? Thinking about CISM diagnostics
  • Brian D. is "working" on a single timeseries generation process that everyone can use; let Brian know if you want to be involved; goal is for an offline tool with conversion based on metadata from netCDF files; option to read history OR timeseries files?
  • Dave B.: single variable timeseries is different from generating a specific plot via climatology generation and/or integrated timeseries
  • Dave L.: ADF already has some of this functionality, needs to be abstracted out so that the other components can use it - seems like a high priority for this project
  • Jesse: ADF is modular, could use the upcoming timeseries generation tool
  • Lev: could see the data processing as a first step / being called first, and then passing it on; Dave L.: would need to know which variables a priori
  • Katelyn: can abstract under the hood
  • Isla: from co-chairs meeting - sampling uncertainty with short runs, uniform way across components?; Justin: could do it in ADF
  • Katelyn: start the conversations in a public forum (github issues? zulip channel?), abstract it later
  • Jesse: ADF could probably leverage GeoCAT functionality; Katelyn: could open an issue in a repo, but also set up follow-up meetings; Dave B.: timeline and resources? e.g., starting point for sea ice is 25 NCL scripts
  • Justin: ADF does run CVDP in the interim (NCL)
  • Dave L.: Can we improve on the scripts during the transition from NCL to python?; Brian M: anecdotal - 50K lines of NCL to 5K lines of Python
  • Orhan: this is related to essential goals of GeoCAT, challenge in prioritizing functions that are needed by the community; GeoCAT has limited resources and encourages open development framework
  • Jesse: engineering concerns - conda via NPL is ideal; computer resources allocation? especially using Dask in multiple notebooks at once; Mike: can pass cluster objects around; Lev: switched to individual spinning up of clusters, some questions about memory usage
