Skip to content

Conversation

@Julians42
Copy link
Member

@Julians42 Julians42 commented Dec 22, 2025

Purpose

adds machine learning cloud fraction. 42% error reduction in SCM calibration to LES over quadrature approach.

performance_graph

To-do

  1. Change NN weights to be online calibrated weights (currently offline calibrated) by choosing the best particle.

  2. Performance was really slow in the diagnostic case that I tested - either my allocation request on HPC was bad or performance / allocations are really bad. It was fine in a previous branch j/mlcf (prognostic edmfx in costa's aiwq setup in coupler was equal speed to quadrature) before rebasing to Anna's version (although I don't think any changes she made would affect performance here).

  3. There is stuff to clean up but I want to get it running ci before tomorrow.

Content


  • I have read and checked the items on the review checklist.

Copy link
Member

@tapios tapios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo the Slack discussion on how to store/apply the NN weights). Would be good for @szy21 to have a quick look too before merging.

# distance to saturation in temperature space
θli = p.scratch.ᶜtemp_scalar_2
θli .= TD.liquid_ice_pottemp.(thermo_params, ᶜts)
delta_θli = FT(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some code comments to explain what's happening here: Approximate gradient dq_sat/dθli by one-sided finite difference.

Wouldn't it make more sense to take a negative delta_θli, given that saturation occurs with cooling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - I'll clear up the code. I was debating using centered difference since that's more accurate but its more costly (another allocation/ computation where I don't think it really matters).

@tapios - In a cloud, saturation temperature is reached with warming so I think the reverse argument also applies in that case. If you want I can switch to minus and recalibrate? (otherwise I'll just leave as plus) I don't intuitively see which side will give less error...

# mixing length
ᶜmixing_length_field = p.scratch.ᶜtemp_scalar
ᶜmixing_length_field .= ᶜmixing_length(Y, p)
# distance to saturation in q space
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cleaner and clearer to move the distance to saturation calculations to their own function.

p.precomputed.cloud_diagnostics_tuple.cf .= cf

# ... and add contributions from the updrafts if using EDMF.
if turbconv_model isa PrognosticEDMFX || turbconv_model isa DiagnosticEDMFX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, better to make the updraft cloud fraction its own function (we'll need it in several places)

@tapios tapios requested a review from szy21 December 22, 2025 18:39
Copy link
Member

@haakon-e haakon-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool work Julian! Left some suggestions for you to ponder

# ᶜρa⁰ = @. lazy(ρa⁰(Y.c.ρ, Y.c.sgsʲs, turbconv_model))
FT = eltype(p.params)

# COMPUTE quantities needed to form pi groups
Copy link
Member

@haakon-e haakon-e Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this function except these lines forming pi groups and calling the NN seems to be identical to set_cloud_fraction! for QuadratureCloud. Perhaps it would be nicer to use that function directly, i.e.:

NVTX.@annotate function set_cloud_fraction!(
    Y,
    p,
    ::Union{EquilMoistModel, NonEquilMoistModel},
    cloud_model::Union{QuadratureCloud, CloudML},
)
# ...

if cloud_model isa CloudML
    # overwrite with the ML computed cloud fraction, leaving q_liq, q_ice computed via quadrature
    @. p.precomputed.cloud_diagnostics_tuple.cf = set_ml_cloud_fraction(cloud_model, ᶜmixing_length_field, Y.c, thermo_params, ᶜts, p.precomputed.ᶜgradᵥ_q_tot, p.precomputed.ᶜgradᵥ_θ_liq_ice)
end

# ...

and then define set_ml_cloud_fraction as the pointwise version of the code below. This has the added benefit of avoiding all the allocations you're introducing, which may be the cause of the performance slowdown you're observing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think something like this would be better.

GridScaleCloud()
elseif cloud_model == "quadrature"
QuadratureCloud(SGSQuadrature(FT))
elseif cloud_model == "cloud_ml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to give it a more descriptive name here, since the ML architecture choice is so specific, e.g. Schmitt2026MLCloudFraction or something similar.

elseif cloud_model == "cloud_ml"
nn_filepath = joinpath(
@clima_artifact("cloud_fraction_nn"),
"arch_2layers_8nodes.jld2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could consider moving the strings "cloud_fraction_nn" and "arch_2layers_8nodes.jld2" to the config/toml file so that you can easily change them later as you refine the model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haakon-e cloud_fraction_nn is the name of the artifact so will keep fixed, but arch_2layers_8nodes could be for example set in the config and seems like a good idea to me if we want to test different architectures down the line

@@ -1,3 +1,6 @@
using Flux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is using Flux needed because you do JLD2.load below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it for the reconstruction of the network (just after the load were we do network(weights)). @imreddyTeja should we be doing import Flux or import it somewhere else?

Comment on lines 339 to 350
#dqt_dz
ᶜ∇q =
dot.(
Geometry.WVector.(p.precomputed.ᶜgradᵥ_q_tot),
Ref(ClimaCore.Geometry.WVector(FT(1.0))),
)
#dθli_dz
ᶜ∇θ =
dot.(
Geometry.WVector.(p.precomputed.ᶜgradᵥ_θ_liq_ice),
Ref(ClimaCore.Geometry.WVector(FT(1.0))),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use the projected_vector_data function here, although it won't be much simpler.

cf .= apply_cf_nn.(Ref(cloud_ml.model), π_1, π_2, π_3, π_4)
#Main.@infiltrate
# overwrite with the ML computed cloud fraction, leaving q_liq, q_ice computed via quadrature
p.precomputed.cloud_diagnostics_tuple.cf .= cf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be multiplied by ᶜρa⁰ for prognostic EDMFX if it's the environment cloud fraction (and I think it should be, as the updraft is treated separately below).

q_v = p.scratch.ᶜtemp_scalar_4
q_v .= specific.(Y.c.ρq_tot, Y.c.ρ)
# Saturation state at current thermodynamic state
q_sat = p.scratch.ᶜtemp_scalar_5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ᶜtemp_scalar_5 is used in ᶜmixing_length called on line 301. It is materialized, so it is probably ok to reuse it here, but it would be good to make sure. An easy way to check is to change the temp_scalar here and see if the result is the same.

Comment on lines 334 to 337
π_1 = (q_sat .- q_v) ./ q_sat
π_2 = Δθli ./ θli_sat
π_3 = @. (((dqsatdθli * ᶜ∇θ - ᶜ∇q) * ᶜmixing_length_field) / q_sat)
π_4 = @. (ᶜ∇θ * ᶜmixing_length_field) / θli_sat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these will allocate?

return cf
end

function saturation_distance(q_v, q_sat, ᶜts, θli, thermo_params, Δθli_fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function allocates. Could you make it a point-wise function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe it is better to make the entire set_ml_cloud_fraction point-wise, if you can. That will avoid the allocations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szy21 I edited to make what I could pointwise to reduce allocations. As there isn't a function for pointwise mixing length and I couldn't find one, or didn't see how, for projected_vector_data I had to add an extra function to keep the original set_cloud_fraction! function clean. Let me know if you can think of a better way (e.g., with one function) or better naming conventions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I can take a look tonight / tomorrow. But if you are in a hurry to get this in I'm fine with merging it. (I didn't look at the changes in packages in Project and Manifest - maybe it would be good if a software engineer looks at them?)

@Julians42 Julians42 requested a review from szy21 December 27, 2025 18:16
Copy link
Member

@tapios tapios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

p,
::Union{EquilMoistModel, NonEquilMoistModel},
qc::QuadratureCloud,
qc::Union{QuadratureCloud, MLCloud},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have both Schmitt2026ML and MLCloud? Seems confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I agree it's currently not consistent with the naming convention for the other cloud fraction parameterizations. I think the most consistent is snake case without "cloud" in the name, so just dispatching on cloud_model: ml. My only concern, as @haakon-e pointed out was that we may have other ml models (@trontrytel was going to work on one for the covariance) so then it becomes ambiguous and since the architecture / pi groups will be specific to a paper I write next year. We could leave it simple for now and just put as ml and try and disambiguate when another ml cloud case arises? Thoughts?

Copy link
Member

@tapios tapios Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you how to handle this. As a general rule, I think we should avoid naming parameterizations after people, because it leads to hesitancy downstream to change the parameterizations--a stasis I have seen many times that is not healthy. But I have no objection and strong thoughts how to handle it here. It may help to make clearer that only cloud fraction is ML--in principle other quantities (condensate path, effective radius etc.) could be too. So something like MLCloudFraction may work. But again: your call. it's not crucially important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can call it MLCloud (or MLCloudFraction) and worry about other options when we have them, but up to you.

Copy link
Member

@szy21 szy21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The physics part looks good to me now. It may be good if @dennisYatunin can take a look at the package and autodiff change. But I'm fine with merging it.

Project.toml Outdated
uuid = "b2c96348-7fb7-4fe0-8da9-78d88439e717"
authors = ["Climate Modeling Alliance"]
version = "0.33.0"
version = "0.33.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the version change on purpose?

p,
::Union{EquilMoistModel, NonEquilMoistModel},
qc::QuadratureCloud,
qc::Union{QuadratureCloud, MLCloud},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can call it MLCloud (or MLCloudFraction) and worry about other options when we have them, but up to you.

Comment on lines 331 to 332
Ref(thermo_params),
Ref(FT),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ref(thermo_params),
Ref(FT),
thermo_params

(I'm not sure, but I don't think you need Ref here. And you can get FT from thermo_params.)

ρ,
ᶜts,
thermo_params,
FT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FT,

# Liquid–ice potential temperature at current thermodynamic state
θli = TD.liquid_ice_pottemp(thermo_params, ᶜts)

q_v = specific(ρq_tot, ρ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
q_v = specific(ρq_tot, ρ)

(and use q_tot, which is an input argument for q_v below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And note q_tot can be negative so you may want to clip it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants