Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write grid variables only once. #613

Open
dabail10 opened this issue Jun 22, 2021 · 10 comments · May be fixed by #1005
Open

Write grid variables only once. #613

dabail10 opened this issue Jun 22, 2021 · 10 comments · May be fixed by #1005

Comments

@dabail10
Copy link
Contributor

One thing that has annoyed me for years is the missing values in land (from land block elimination) in the grid variables. Several users have requested that we fill in the land blocks with valid values. Mostly for analysis after the fact. One way to do this is how they do it in CESM-POP. There is a single file with the string "once" in it, where the time constant variables are written only once. This would be written in ice_grid.F90 when the global arrays are read in. This is still CF-compliant in the sense that this file would always accompany the history files. It also would save some space not writing all the grid information to every history file.

@anton-seaice
Copy link
Contributor

I "think" we currently only have the missing values during the early initialization (i.e. during init_grid2) so the easiest thing would be to create and write the one-off / fixed frequency grid output file during that subroutine.

If we want to keep (in memory) the coordinates for eliminated land blocks to include in the history output, then the change gets a bit more complicated!

@apcraig
Copy link
Contributor

apcraig commented Jan 23, 2025

I am looking at this now. The current implementation

  • reads global ULAT and KMT
  • creates a decomposition with land block elimination
  • reads a handful of global grid fields ULAT, ULON, KMT, HTE, HTN from the grid file
  • computes all the CICE grid variables in parallel on the decomposed grid

So,

  • it will be difficult to generate a CICE grid file that includes the land block elimination values. We would have to compute all those variables on a global grid then scatter them. That would be a significant refactor of the current implementation.
  • we could write out only ULAT, ULON, and KMT to the "grid output file" but we already have that in the grid input file and that probably does not add much value

Some ideas,

  • we could still add a separate grid output file with all the CICE grid variables. they would still have land block elimination (so missing data), but it would allow the users to have either a single grid file or grid variables in all the history files or neither or both.
  • users could generate a one-time grid file by turning on the "grid output file" and also setting "distribution_wght='blockall'" in namelist. That namelist option prevents land block elimination. This could be done for one run in production (or pre-production) to generate a complete grid file while still allowing land block elimination during the run for performance.

So, by adding an ability to write a separate grid output file, keeping the field-by-field control of grid variables on history files, and the ability to run with and without land block elimination, users should be able to do whatever they want and have the data they need. Thoughts?

@anton-seaice
Copy link
Contributor

I believe the main issue we have with the current setup is that concatenating output files is slow (when using xarray.concat or related methods) due to xarray checking for consistency where the same variable exists in multiple files being concatenated. In that way, just writing the grid to a different file solves the problem.

I will endeavor to do some experimenting on how much the data in the eliminated blocks matters also and confirm. I suspect these variables are more important for coordinates (all the lat/lons) and less important for the rest of the grid (lengths/angles/areas).

@dabail10
Copy link
Contributor Author

The way we have been getting around this is to use the netcdf grid files instead of the grid files from CICE history files. However, say you are writing daily history output, then you have all of the grid information in all of the files. POP had introduced a pop.once.nc file where basically the grid reading module (reading in global variables) is just dumped once before the scattering happens.

@apcraig
Copy link
Contributor

apcraig commented Jan 23, 2025

I think the idea is to add a few features. First, ability to turn off grid variables in the history files already exists. Philippe added that a while ago. I want to add support to write out all the CICE grid variables to a file once. That could be with land block elimination or not, depending on the namelist setting for the run, up to the user. Again, a user could run their case for a few timesteps and write out the grid file using NO land block elimination. That would be their global grid file and it would have all the CICE grid fields. Then, the user can start production with land block elimination on and grid file writing off. They could write grid fields (or not) to the history files. Those history files would not have any values where land blocks are eliminated, but they would still have their "one-time" grid file written already.

I'm less convinced that writing out the CICE input fields to an output file is useful. First, that file already exists, it's the input file. Second, the number of fields will be extremely limited and the field names should probably be different than the CICE grid output fields to be clear that they are NOT the fields used in CICE but are just the fields that are read in. Again, not sure I think that feature is useful. @dabail10, is that really a feature that you want?

@dabail10
Copy link
Contributor Author

So, the rational is that the history files go out to some repository like the CMIP archive and the static grid files don't necessarily come with. So, that is why POP created the once.nc file so it follows the history files. Also, this was from a time when the static files were binary only.

@apcraig
Copy link
Contributor

apcraig commented Jan 23, 2025

OK, I see why the static output file that is similar to the input file might be useful. But, is that useful in CICE if we have the other features proposed above? Especially the ability to write a one-time global CICE grid file with all the CICE grid variables? @dabail10, are you asking for a feature in CICE that works like the POP feature? I think I'm proposing something much more useful (we could do both).

@dabail10
Copy link
Contributor Author

Not really. I was trying to provide the backstory. I think a one off grid file of some kind is what I am looking for. Ideally it wouldn't have land-block elimination. Another piece to this is that we often do an ncrcat on the time slice history files to create single variable timeseries files. So one file that is 100 years for aice, one for hi, etc. However, each still has the grid variables. Better than each time slice file having the grid variables though.

@apcraig
Copy link
Contributor

apcraig commented Jan 24, 2025

So, I've gotten myself down a rabbit hole. I have added a namelist and am able to produce a one-time grid file, all good there.

But then when I try to look at the file in ferret, ferret complains about a bunch of the axes orders. In particular, it doesn't like the bounds fields having their first index as vertex then (lon,lat) as 2nd/3rd even though it seems to be the CF standard, https://cfconventions.org/cf-conventions/cf-conventions.html#bounds-lat-lon. ferret just will not read those files which is partly a problem with ferret that I'm about to give up on. ferret also complains about the 5d variables where we have, for example, Tinz_d(time, nc, nkice, nj, ni), and there is very little guidance about 5d variables in CF. CF really only allows us to define X,Y,Z,T axes. (See NOAA-PMEL/Ferret#1988 (comment))

As part of this testing/debugging, I've put the history files through a CF checker (https://cfchecker.ncas.ac.uk/) and I'm getting some errors there too which I'm trying to correct.

So, I'm trying to fix a bunch of stuff so the netcdf files are CF compliant and work in ferret, but it's a mess. I have tried to add axis attributes to some of the dimension variables which means I also have to write them as variables. I have tried to add more coordinate attributes to some of the lon and lat variables. Some things work in ferret and/or CF, some don't. It's really very complicated and confusing. Axis, coordinates, units, and standard_name attributes all play a role. And, of coarse, we have curvilinear coordinates which doesn't help.

I'm also trying to avoid changing any of the basic properties of the variables written. Like, we could set the bounds fields vertex coordinate as the third coordinate (instead of 1st) and that fixes a bunch of things. We could also create new vertical coordinates that combine nc and the other vertical coordinate (i.e. nc+nkice for Tinz_d) to create a new single combined vertical coordinate. I'm worried these changes could break post-processing tools plus it's just not the way we want to write the fields.

I'm considering giving up on ferret and CF. What do others think? Any other tools or documentation that might help me? @phil-blain @dabail10 @anton-seaice

@anton-seaice
Copy link
Contributor

anton-seaice commented Jan 27, 2025

Thanks @apcraig

I don't have a particularly well informed view, but my feeling is its essential we fix any CF errors and we can ignore warnings as needed. Most of our users don't use Ferret, so I am less worried about that.

I normally use this tool http://climate-cms.wikis.unsw.edu.au/CF_checker - but I am sure the compliance checkers are all fairly similar.

p.s. this may be an excuse to update the convention we claim compliance againsts - currently CF-1.0:

@apcraig apcraig linked a pull request Jan 31, 2025 that will close this issue
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants