You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: SELU/README.md
+18-3
Original file line number
Diff line number
Diff line change
@@ -53,8 +53,23 @@ Modify `plot_results.py` to select your experiments, then run:
53
53
python plot_results.py
54
54
55
55
56
-
## Notes
56
+
## Results
57
57
58
58
- The architecture of the NN is the same as in the original paper.
59
-
- We plot the loss curves to give some more perspective.
60
-
- Initially had a hard time reproducing results. Inspection of loss curves show you just have to train longer until Soboleb loss and MSE loss have similar magnitude. Or increase the weight on the Sobolev loss.
59
+
- The depths are a bit different (powers of 2 for inner layers, not counting first and last layer)
60
+
- N.B. The number of epochs vary between plots.
61
+
- At LR = 1E-5, the results are consistent with the paper albeit the minimum training loss is quite a bit higher than in the paper.
62
+
- The Adam optimizer was also used (far less epochs though) for comparison. SELU still seems to work better but is more unstable
63
+
- At higher learning rates, Adam no longer works really well for SELU. (Note that the conclusion may change with more epochs but 2000 epochs is quite a lot of time...)
64
+
65
+

66
+

67
+

68
+
69
+
70
+
71
+
## Conclusion
72
+
73
+
- SELU is definitely better behaved with SGD.
74
+
- Depending on learning rate, it may also be better with Adam.
75
+
- However a fair bit of fine tuning seems to be needed to get best performance (even with SGD) and training may be quite slow (lots of epochs needed for low learning rate).
0 commit comments