-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match Nature Paper Implementation #5
Conversation
Going through the model layer names + shapes: Not sure what
is for, as in I don't see where in the paper this would come from But for the GRU layers, I couldn't get how they got the GRU with the 384 work for input along with the 768 sized latent space. Turns out they just concatenated them!
|
There still are some layers I don't quite get, namely, the difference between the
From what I can tell, the |
We can look at this together if you like |
Yeah, that'd be great! I'm finishing up a script for getting the satellite data in |
Following this since I'm interested. Here is the link for the converted ONNX version: https://drive.google.com/file/d/1rpgoERj8CETkty049mwdGoy0DAr7cG1F/view?usp=sharing |
Oh awesome! Thanks! Yeah, that should definitely help, compared to me just printing out the names and shapes |
Hopefully! unfortunately, the visualization unrolls all the RNN loops so the resulting graph has a lot of repeated blocks, but maybe it helps. |
For anyone just looking at it, this is the graph created from @franchg ONNX export So it might take a bit, but since I am pretty sure this current implementation is close to the actual one, just need to check where some of the blocks are |
Attaching the ONNX conversion log, just in case. |
They also change the latent space being sampled for evaluation:
|
There is a reason I had it as linear interpolation, but don't remember, this should work though, and matches what their model has
Follows the new pseudocode they have
The public implementation is also only the generator part, neither of the discriminators are included. If there is a way to partially execute a SavedModel, that would probably be really helpful for determining how exactly it works, because while the netron app is great for visualizing, and its been still very helpful, I'm just getting a bit lost on what's happening. |
I've asked here google-deepmind/deepmind-research#290 if they are planning on publicly releasing the model code. I would expect not, as they didn't release it with the rest of the inference code, but hopefully something comes out of it |
From this: google-deepmind/deepmind-research#290 (comment) the pseudocode seems to have been uploaded. I've now downloaded it and it seems to be pretty much the TF implementation! So that should help a lot. It seems detailed enough to get the models to match |
Pseudocode looks quite complete, only thing I notice is that layers.txt and latent_stack.txt are the same content, so there might be a mistake on copy/paste or something, but that does help a ton! I'll be able to match nearly everything then exactly. |
for more information, see https://pre-commit.ci
Okay, so now this code should essentially match their actual implementation, other than a few spectral-normalizations that need to be added to the 2D convolutions. And some refactoring to make it a bit nicer looking and better documented to get the linting to and tests to pass. |
for more information, see https://pre-commit.ci
Awesome! |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Each component now works and is tested, but the Generator generates examples with NaNs in testing, so trying to figure out why. NaNs start to show up after the first ConvGRU, even though when testing the ConvGRU on its own there is no NaN values. |
NaNs show up the most in Layer 2, but it is not always consistent
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I'll merge this now as the code now does match the Nature implementation, and open a different one for the issue with NaNs |
…paper Match Nature Paper Implementation
Pull Request
Description
The nature version of the paper has some changes compared to the preprint. Additionally, the released TensorFlow model is helpful in terms of matching the model architecture more closely. https://www.nature.com/articles/s41586-021-03854-z
This should also allow loading the TensorFlow weights into PyTorch with a few modifications.
Fixes issue #3
Relates to #4 #1
How Has This Been Tested?
Unit tests
Checklist: