-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GraphCast improvements - Part I #510
Conversation
/blossom-ci |
@mnabian
|
Thanks @stadlmax , I'll add your comments to my epic and consider them all. |
Note to myself: API updates breaks GraphCast tests. Need to update them all. |
@stadlmax as far as I remember, we were using fused layernorm and that gave us nice speedup:https://github.com/NVIDIA/modulus/blob/main/modulus/models/gnn_layers/mesh_graph_mlp.py#L157... Did you also compare |
Yes, for AIFS, I found TE > APEX > PyTorch throughout a bunch of usual sizes AIFS had in their RFI benchmark. Especially the backward kernels in TE are much better for our cases. (reported numbers are runtimes, lower is better) num_channels = 256
num_channels = 384
num_channels = 512
|
This is great comparison, thanks! I'll switch to te then. Do we have any reason to still keep fused layernorm from apex, or we should just remove it? |
I guess, no, not really. TE also should be decently covered when it comes to development specifically for Blackwell and beyond. I know a few POCs that try to optimize The LN in TE even further. |
@stadlmax added support for TE layernorm. |
Done |
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
/blossom-ci |
Thanks for addressing the feedback, looks good to me. |
/blossom-ci |
Modulus Pull Request
Description
Closes #506, #505, #486, #508, #509, #511, #516, #517
Checklist
Dependencies