master consideres batchnorm mean/std update as gradients #19

vlimant · 2018-04-06T10:14:01Z

in the way the master receives the "update" from the workers, for the bn mean weight (the running mean) it would consider the diff as a gradient and do something with this, instead of applying something dedicated to a value that was not updated by gradient descent.
the gamma/beta weights of the bn are ok in this respect.
I fear that this is playing badly with svalleco#3
@duanders if you have any insights on how to modify the mpi-learn-optimizers to take this in consideration, please do tell

duanders · 2018-04-08T04:47:35Z

I see, glad you got to the bottom of it. Have you checked how the keras optimizers handle this?

vlimant · 2018-04-08T16:08:22Z

I was not able to track this down all the way

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master consideres batchnorm mean/std update as gradients #19

master consideres batchnorm mean/std update as gradients #19

vlimant commented Apr 6, 2018

duanders commented Apr 8, 2018

vlimant commented Apr 8, 2018

master consideres batchnorm mean/std update as gradients #19

master consideres batchnorm mean/std update as gradients #19

Comments

vlimant commented Apr 6, 2018

duanders commented Apr 8, 2018

vlimant commented Apr 8, 2018