-
Notifications
You must be signed in to change notification settings - Fork 27
Add ml cloud fraction closure #4185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
tapios
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo the Slack discussion on how to store/apply the NN weights). Would be good for @szy21 to have a quick look too before merging.
src/cache/cloud_fraction.jl
Outdated
| # distance to saturation in temperature space | ||
| θli = p.scratch.ᶜtemp_scalar_2 | ||
| θli .= TD.liquid_ice_pottemp.(thermo_params, ᶜts) | ||
| delta_θli = FT(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some code comments to explain what's happening here: Approximate gradient dq_sat/dθli by one-sided finite difference.
Wouldn't it make more sense to take a negative delta_θli, given that saturation occurs with cooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - I'll clear up the code. I was debating using centered difference since that's more accurate but its more costly (another allocation/ computation where I don't think it really matters).
@tapios - In a cloud, saturation temperature is reached with warming so I think the reverse argument also applies in that case. If you want I can switch to minus and recalibrate? (otherwise I'll just leave as plus) I don't intuitively see which side will give less error...
src/cache/cloud_fraction.jl
Outdated
| # mixing length | ||
| ᶜmixing_length_field = p.scratch.ᶜtemp_scalar | ||
| ᶜmixing_length_field .= ᶜmixing_length(Y, p) | ||
| # distance to saturation in q space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cleaner and clearer to move the distance to saturation calculations to their own function.
src/cache/cloud_fraction.jl
Outdated
| p.precomputed.cloud_diagnostics_tuple.cf .= cf | ||
|
|
||
| # ... and add contributions from the updrafts if using EDMF. | ||
| if turbconv_model isa PrognosticEDMFX || turbconv_model isa DiagnosticEDMFX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, better to make the updraft cloud fraction its own function (we'll need it in several places)
haakon-e
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool work Julian! Left some suggestions for you to ponder
src/cache/cloud_fraction.jl
Outdated
| # ᶜρa⁰ = @. lazy(ρa⁰(Y.c.ρ, Y.c.sgsʲs, turbconv_model)) | ||
| FT = eltype(p.params) | ||
|
|
||
| # COMPUTE quantities needed to form pi groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this function except these lines forming pi groups and calling the NN seems to be identical to set_cloud_fraction! for QuadratureCloud. Perhaps it would be nicer to use that function directly, i.e.:
NVTX.@annotate function set_cloud_fraction!(
Y,
p,
::Union{EquilMoistModel, NonEquilMoistModel},
cloud_model::Union{QuadratureCloud, CloudML},
)
# ...
if cloud_model isa CloudML
# overwrite with the ML computed cloud fraction, leaving q_liq, q_ice computed via quadrature
@. p.precomputed.cloud_diagnostics_tuple.cf = set_ml_cloud_fraction(cloud_model, ᶜmixing_length_field, Y.c, thermo_params, ᶜts, p.precomputed.ᶜgradᵥ_q_tot, p.precomputed.ᶜgradᵥ_θ_liq_ice)
end
# ...and then define set_ml_cloud_fraction as the pointwise version of the code below. This has the added benefit of avoiding all the allocations you're introducing, which may be the cause of the performance slowdown you're observing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think something like this would be better.
src/solver/model_getters.jl
Outdated
| GridScaleCloud() | ||
| elseif cloud_model == "quadrature" | ||
| QuadratureCloud(SGSQuadrature(FT)) | ||
| elseif cloud_model == "cloud_ml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to give it a more descriptive name here, since the ML architecture choice is so specific, e.g. Schmitt2026MLCloudFraction or something similar.
src/solver/model_getters.jl
Outdated
| elseif cloud_model == "cloud_ml" | ||
| nn_filepath = joinpath( | ||
| @clima_artifact("cloud_fraction_nn"), | ||
| "arch_2layers_8nodes.jld2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could consider moving the strings "cloud_fraction_nn" and "arch_2layers_8nodes.jld2" to the config/toml file so that you can easily change them later as you refine the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haakon-e cloud_fraction_nn is the name of the artifact so will keep fixed, but arch_2layers_8nodes could be for example set in the config and seems like a good idea to me if we want to test different architectures down the line
| @@ -1,3 +1,6 @@ | |||
| using Flux | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is using Flux needed because you do JLD2.load below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it for the reconstruction of the network (just after the load were we do network(weights)). @imreddyTeja should we be doing import Flux or import it somewhere else?
src/cache/cloud_fraction.jl
Outdated
| #dqt_dz | ||
| ᶜ∇q = | ||
| dot.( | ||
| Geometry.WVector.(p.precomputed.ᶜgradᵥ_q_tot), | ||
| Ref(ClimaCore.Geometry.WVector(FT(1.0))), | ||
| ) | ||
| #dθli_dz | ||
| ᶜ∇θ = | ||
| dot.( | ||
| Geometry.WVector.(p.precomputed.ᶜgradᵥ_θ_liq_ice), | ||
| Ref(ClimaCore.Geometry.WVector(FT(1.0))), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably use the projected_vector_data function here, although it won't be much simpler.
src/cache/cloud_fraction.jl
Outdated
| cf .= apply_cf_nn.(Ref(cloud_ml.model), π_1, π_2, π_3, π_4) | ||
| #Main.@infiltrate | ||
| # overwrite with the ML computed cloud fraction, leaving q_liq, q_ice computed via quadrature | ||
| p.precomputed.cloud_diagnostics_tuple.cf .= cf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be multiplied by ᶜρa⁰ for prognostic EDMFX if it's the environment cloud fraction (and I think it should be, as the updraft is treated separately below).
Working ML cloud fraction Adds updraft boolean and liq and ice water content via quadrature broadcasted wrapper, ready to gpu test change to use artifact
Co-authored-by: Haakon Ludvig Langeland Ervik <[email protected]>
src/cache/cloud_fraction.jl
Outdated
| q_v = p.scratch.ᶜtemp_scalar_4 | ||
| q_v .= specific.(Y.c.ρq_tot, Y.c.ρ) | ||
| # Saturation state at current thermodynamic state | ||
| q_sat = p.scratch.ᶜtemp_scalar_5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ᶜtemp_scalar_5 is used in ᶜmixing_length called on line 301. It is materialized, so it is probably ok to reuse it here, but it would be good to make sure. An easy way to check is to change the temp_scalar here and see if the result is the same.
src/cache/cloud_fraction.jl
Outdated
| π_1 = (q_sat .- q_v) ./ q_sat | ||
| π_2 = Δθli ./ θli_sat | ||
| π_3 = @. (((dqsatdθli * ᶜ∇θ - ᶜ∇q) * ᶜmixing_length_field) / q_sat) | ||
| π_4 = @. (ᶜ∇θ * ᶜmixing_length_field) / θli_sat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these will allocate?
src/cache/cloud_fraction.jl
Outdated
| return cf | ||
| end | ||
|
|
||
| function saturation_distance(q_v, q_sat, ᶜts, θli, thermo_params, Δθli_fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function allocates. Could you make it a point-wise function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it is better to make the entire set_ml_cloud_fraction point-wise, if you can. That will avoid the allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szy21 I edited to make what I could pointwise to reduce allocations. As there isn't a function for pointwise mixing length and I couldn't find one, or didn't see how, for projected_vector_data I had to add an extra function to keep the original set_cloud_fraction! function clean. Let me know if you can think of a better way (e.g., with one function) or better naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I can take a look tonight / tomorrow. But if you are in a hurry to get this in I'm fine with merging it. (I didn't look at the changes in packages in Project and Manifest - maybe it would be good if a software engineer looks at them?)
tapios
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| p, | ||
| ::Union{EquilMoistModel, NonEquilMoistModel}, | ||
| qc::QuadratureCloud, | ||
| qc::Union{QuadratureCloud, MLCloud}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have both Schmitt2026ML and MLCloud? Seems confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I agree it's currently not consistent with the naming convention for the other cloud fraction parameterizations. I think the most consistent is snake case without "cloud" in the name, so just dispatching on cloud_model: ml. My only concern, as @haakon-e pointed out was that we may have other ml models (@trontrytel was going to work on one for the covariance) so then it becomes ambiguous and since the architecture / pi groups will be specific to a paper I write next year. We could leave it simple for now and just put as ml and try and disambiguate when another ml cloud case arises? Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to you how to handle this. As a general rule, I think we should avoid naming parameterizations after people, because it leads to hesitancy downstream to change the parameterizations--a stasis I have seen many times that is not healthy. But I have no objection and strong thoughts how to handle it here. It may help to make clearer that only cloud fraction is ML--in principle other quantities (condensate path, effective radius etc.) could be too. So something like MLCloudFraction may work. But again: your call. it's not crucially important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can call it MLCloud (or MLCloudFraction) and worry about other options when we have them, but up to you.
szy21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The physics part looks good to me now. It may be good if @dennisYatunin can take a look at the package and autodiff change. But I'm fine with merging it.
Project.toml
Outdated
| uuid = "b2c96348-7fb7-4fe0-8da9-78d88439e717" | ||
| authors = ["Climate Modeling Alliance"] | ||
| version = "0.33.0" | ||
| version = "0.33.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the version change on purpose?
| p, | ||
| ::Union{EquilMoistModel, NonEquilMoistModel}, | ||
| qc::QuadratureCloud, | ||
| qc::Union{QuadratureCloud, MLCloud}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can call it MLCloud (or MLCloudFraction) and worry about other options when we have them, but up to you.
src/cache/cloud_fraction.jl
Outdated
| Ref(thermo_params), | ||
| Ref(FT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Ref(thermo_params), | |
| Ref(FT), | |
| thermo_params |
(I'm not sure, but I don't think you need Ref here. And you can get FT from thermo_params.)
src/cache/cloud_fraction.jl
Outdated
| ρ, | ||
| ᶜts, | ||
| thermo_params, | ||
| FT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| FT, |
src/cache/cloud_fraction.jl
Outdated
| # Liquid–ice potential temperature at current thermodynamic state | ||
| θli = TD.liquid_ice_pottemp(thermo_params, ᶜts) | ||
|
|
||
| q_v = specific(ρq_tot, ρ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| q_v = specific(ρq_tot, ρ) |
(and use q_tot, which is an input argument for q_v below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And note q_tot can be negative so you may want to clip it.
Co-authored-by: Zhaoyi Shen <[email protected]>
Purpose
adds machine learning cloud fraction. 42% error reduction in SCM calibration to LES over quadrature approach.
To-do
Change NN weights to be online calibrated weights (currently offline calibrated) by choosing the best particle.
Performance was really slow in the diagnostic case that I tested - either my allocation request on HPC was bad or performance / allocations are really bad. It was fine in a previous branch
j/mlcf(prognostic edmfx in costa's aiwq setup in coupler was equal speed to quadrature) before rebasing to Anna's version (although I don't think any changes she made would affect performance here).There is stuff to clean up but I want to get it running ci before tomorrow.
Content