Skip to content

Commit d167082

Browse files
author
Thibault de Boissiere
committed
Update readme
1 parent 41360d9 commit d167082

File tree

1 file changed

+18
-3
lines changed

1 file changed

+18
-3
lines changed

Diff for: SELU/README.md

+18-3
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,23 @@ Modify `plot_results.py` to select your experiments, then run:
5353
python plot_results.py
5454

5555

56-
## Notes
56+
## Results
5757

5858
- The architecture of the NN is the same as in the original paper.
59-
- We plot the loss curves to give some more perspective.
60-
- Initially had a hard time reproducing results. Inspection of loss curves show you just have to train longer until Soboleb loss and MSE loss have similar magnitude. Or increase the weight on the Sobolev loss.
59+
- The depths are a bit different (powers of 2 for inner layers, not counting first and last layer)
60+
- N.B. The number of epochs vary between plots.
61+
- At LR = 1E-5, the results are consistent with the paper albeit the minimum training loss is quite a bit higher than in the paper.
62+
- The Adam optimizer was also used (far less epochs though) for comparison. SELU still seems to work better but is more unstable
63+
- At higher learning rates, Adam no longer works really well for SELU. (Note that the conclusion may change with more epochs but 2000 epochs is quite a lot of time...)
64+
65+
![learning rate 1e-2](figures/SELU_LR_1E-2.png)
66+
![learning rate 1e-3](figures/SELU_LR_1E-3.png)
67+
![learning rate 1e-5](figures/SELU_LR_1E-5.png)
68+
69+
70+
71+
## Conclusion
72+
73+
- SELU is definitely better behaved with SGD.
74+
- Depending on learning rate, it may also be better with Adam.
75+
- However a fair bit of fine tuning seems to be needed to get best performance (even with SGD) and training may be quite slow (lots of epochs needed for low learning rate).

0 commit comments

Comments
 (0)