Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match Nature Paper Implementation #5

Merged
merged 64 commits into from
Nov 11, 2021
Merged

Match Nature Paper Implementation #5

merged 64 commits into from
Nov 11, 2021

Conversation

jacobbieker
Copy link
Member

Pull Request

Description

The nature version of the paper has some changes compared to the preprint. Additionally, the released TensorFlow model is helpful in terms of matching the model architecture more closely. https://www.nature.com/articles/s41586-021-03854-z

This should also allow loading the TensorFlow weights into PyTorch with a few modifications.

Fixes issue #3
Relates to #4 #1

How Has This Been Tested?

Unit tests

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@jacobbieker jacobbieker added the enhancement New feature or request label Sep 30, 2021
@jacobbieker jacobbieker self-assigned this Sep 30, 2021
@jacobbieker
Copy link
Member Author

Going through the model layer names + shapes:

Not sure what

Name: Generator/spz_bottleneck_wrapper/spz_bottleneck/w/ema_0.9999:0 Shape: (1, 1, 768, 10)

is for, as in I don't see where in the paper this would come from

But for the GRU layers, I couldn't get how they got the GRU with the 384 work for input along with the 768 sized latent space. Turns out they just concatenated them!

Name: Generator/g_gru_0/conv_r_wrapper/conv_r/w/ema_0.9999:0 Shape: (3, 3, 1152, 384)
Name: Generator/g_gru_0/conv_u_wrapper/conv_u/w/ema_0.9999:0 Shape: (3, 3, 1152, 384)
Name: Generator/g_gru_0/conv_c_wrapper/conv_c/w/ema_0.9999:0 Shape: (3, 3, 1152, 384)

@jacobbieker
Copy link
Member Author

There still are some layers I don't quite get, namely, the difference between the G'<up arrow> and G as well as how that exactly maps to this recurring block in the weights:

Name: Generator/hh_g_0_wrapper/hh_g_0/w/ema_0.9999:0 Shape: (3, 3, 384, 768)
Name: Generator/GBlock/BatchNorm/scale_wrapper/scale/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock/BatchNorm/offset_wrapper/offset/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock/conv0_wrapper/conv0/w/ema_0.9999:0 Shape: (3, 3, 768, 768)
Name: Generator/GBlock/BatchNorm_1/scale_wrapper/scale/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock/BatchNorm_1/offset_wrapper/offset/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock/conv1_wrapper/conv1/w/ema_0.9999:0 Shape: (3, 3, 768, 768)
Name: Generator/GBlock_1/conv0_wrapper/conv0/w/ema_0.9999:0 Shape: (3, 3, 10, 10)
Name: Generator/GBlock_1/conv1_wrapper/conv1/w/ema_0.9999:0 Shape: (3, 3, 10, 8)
Name: Generator/GBlock_1/conv_sc_wrapper/conv_sc/w/ema_0.9999:0 Shape: (1, 1, 10, 8)
Name: Generator/GBlock_2/BatchNorm/scale_wrapper/scale/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock_2/BatchNorm/offset_wrapper/offset/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock_2/conv0_wrapper/conv0/w/ema_0.9999:0 Shape: (3, 3, 768, 768)
Name: Generator/GBlock_2/BatchNorm_1/scale_wrapper/scale/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock_2/BatchNorm_1/offset_wrapper/offset/w/ema_0.9999:0 Shape: (256, 768)
Name: Generator/GBlock_2/conv1_wrapper/conv1/w/ema_0.9999:0 Shape: (3, 3, 768, 384)
Name: Generator/GBlock_2/conv_sc_wrapper/conv_sc/w/ema_0.9999:0 Shape: (1, 1, 768, 384)

From what I can tell, the GBlock_2 is what is in the diagram in the paper, just not sure about the GBlock_1 and GBlock

@jacobbieker jacobbieker added the help wanted Extra attention is needed label Oct 1, 2021
@tcapelle
Copy link

tcapelle commented Oct 1, 2021

We can look at this together if you like

@jacobbieker
Copy link
Member Author

We can look at this together if you like

Yeah, that'd be great! I'm finishing up a script for getting the satellite data in nowcasting-dataset but will then come back to this. I think its mostly there, other than this GBlock bit, I could probably get most of the weights copied over pretty easily

@franchg
Copy link

franchg commented Oct 1, 2021

Following this since I'm interested.
If this can be helpful I was able to transform the model to ONNX format ( thanks to https://github.com/onnx/tensorflow-onnx). This allows visualizing the model through Netron ( https://github.com/lutzroeder/netron )
The model is quite big, so navigating the visualization is a bit messy, but maybe it can help make sense of the model.

Here is the link for the converted ONNX version: https://drive.google.com/file/d/1rpgoERj8CETkty049mwdGoy0DAr7cG1F/view?usp=sharing

@jacobbieker
Copy link
Member Author

Following this since I'm interested. If this can be helpful I was able to transform the model to ONNX format ( thanks to https://github.com/onnx/tensorflow-onnx). This allows visualizing the model through Netron ( https://github.com/lutzroeder/netron ) The model is quite big, so navigating the visualization is a bit messy, but maybe it can help make sense of the model.

Here is the link for the converted ONNX version: https://drive.google.com/file/d/1rpgoERj8CETkty049mwdGoy0DAr7cG1F/view?usp=sharing

Oh awesome! Thanks! Yeah, that should definitely help, compared to me just printing out the names and shapes

@franchg
Copy link

franchg commented Oct 1, 2021

Following this since I'm interested. If this can be helpful I was able to transform the model to ONNX format ( thanks to https://github.com/onnx/tensorflow-onnx). This allows visualizing the model through Netron ( https://github.com/lutzroeder/netron ) The model is quite big, so navigating the visualization is a bit messy, but maybe it can help make sense of the model.
Here is the link for the converted ONNX version: https://drive.google.com/file/d/1rpgoERj8CETkty049mwdGoy0DAr7cG1F/view?usp=sharing

Oh awesome! Thanks! Yeah, that should definitely help, compared to me just printing out the names and shapes

Hopefully! unfortunately, the visualization unrolls all the RNN loops so the resulting graph has a lot of repeated blocks, but maybe it helps.

@jacobbieker
Copy link
Member Author

For anyone just looking at it, this is the graph created from @franchg ONNX export
model onnx

So it might take a bit, but since I am pretty sure this current implementation is close to the actual one, just need to check where some of the blocks are

@franchg
Copy link

franchg commented Oct 1, 2021

Attaching the ONNX conversion log, just in case.
onnx.log

@jacobbieker
Copy link
Member Author

They also change the latent space being sampled for evaluation:

We develop one possible post-processing approach to improve the reliability of the generative nowcasts. At prediction time, the latent variables are samples from a Gaussian distribution with standard deviation 2 (rather than 1), relying on empirical insights on maintaining resolution while increasing sample diversity in generative models24,37. In addition, for each realization we apply a stochastic perturbation to the input radar by multiplying a single constant drawn from a unit-mean gamma distribution G(α = 5, β = 5) to the entire input radar field. Extended Data Figures 4 (UK) and 9 (US) shows how the post-processing improves the reliability diagram and rank histogram compared to the uncorrected approach.

There is a reason I had it as linear interpolation, but don't remember, this should work though, and matches what their model has
Follows the new pseudocode they have
@jacobbieker
Copy link
Member Author

The public implementation is also only the generator part, neither of the discriminators are included. If there is a way to partially execute a SavedModel, that would probably be really helpful for determining how exactly it works, because while the netron app is great for visualizing, and its been still very helpful, I'm just getting a bit lost on what's happening.

@jacobbieker
Copy link
Member Author

I've asked here google-deepmind/deepmind-research#290 if they are planning on publicly releasing the model code. I would expect not, as they didn't release it with the rest of the inference code, but hopefully something comes out of it

@jacobbieker jacobbieker mentioned this pull request Oct 15, 2021
@jacobbieker
Copy link
Member Author

From this: google-deepmind/deepmind-research#290 (comment) the pseudocode seems to have been uploaded. I've now downloaded it and it seems to be pretty much the TF implementation! So that should help a lot. It seems detailed enough to get the models to match

@jacobbieker
Copy link
Member Author

Pseudocode looks quite complete, only thing I notice is that layers.txt and latent_stack.txt are the same content, so there might be a mistake on copy/paste or something, but that does help a ton! I'll be able to match nearly everything then exactly.

@jacobbieker
Copy link
Member Author

Okay, so now this code should essentially match their actual implementation, other than a few spectral-normalizations that need to be added to the 2D convolutions. And some refactoring to make it a bit nicer looking and better documented to get the linting to and tests to pass.

@JackKelly
Copy link
Member

Awesome!

@jacobbieker
Copy link
Member Author

Each component now works and is tested, but the Generator generates examples with NaNs in testing, so trying to figure out why. NaNs start to show up after the first ConvGRU, even though when testing the ConvGRU on its own there is no NaN values.

@jacobbieker
Copy link
Member Author

I'll merge this now as the code now does match the Nature implementation, and open a different one for the issue with NaNs

@jacobbieker jacobbieker merged commit 1743e06 into main Nov 11, 2021
@jacobbieker jacobbieker deleted the jacob/nature-paper branch November 11, 2021 10:05
Chevolier pushed a commit to Chevolier/skillful_nowcasting that referenced this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Pre-trained weights from DeepMind and upload to HuggingFace Match Paper implementation closer
4 participants