Skip to content

Commit c0437e8

Browse files
author
Thomas Mulc
committed
added readmes in the folders
1 parent 7be1eec commit c0437e8

File tree

8 files changed

+24
-0
lines changed

8 files changed

+24
-0
lines changed

ADAG/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## ADAG (Asynchronous Distributed Adaptive Gradients)
2+
3+
Similar to DOWNPOUR expect that it uses a communications window *T* and accumulated gradients for *T* steps before sending updates to the parameter server.

DOWNPOUR-Easy/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## DOWNPOUR Easy
2+
3+
The same as DOWNPOUR except that instead of updating variables using Adagrad locally, variables are updated using SGD. This makes implementing the algorithm easier because you don't need to worry about finding the variables created by the local Adagrad optimizer and forcing them to be local variables.

DOWNPOUR/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## DOWNPOUR
2+
3+
Similar to Hogwild! expect that it use Adagrad to update the local workers. Additionally, there is a communication window which servers as a time buffer for updates to the parameter server (although the original paper set the communication window to one, which voided the need for this buffer).

Hogwild/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## HogWild!
2+
3+
The famous, lock-free approach to SGD. Have a bunch of workers and parameter server, then let the workers update the variables whenever they want.
+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Multiple GPUs Single Machine
2+
3+
Use environment variables to manually override the available GPUs in a TensorFlow process. There is a way to do this without using environment variables, but it's a not worth the effort (if you really need this, you can remap the avaible devices so the GPU you want to use is labeled as device 0, then set visible devices to 0).

SDAG/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## SDAG (Synchronous Distributed Adaptive Gradients)
2+
3+
A hybrid of SSGD and Adag. Average gradients over the communication window but apply updates to the ps variables synchronously.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## SSGD different learning rates
2+
3+
Same as vanilla SSGD except each of the workers has its own learning rate.

Synchronous-SGD/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## SSGD (Synchronous SGD)
2+
3+
Have workers send their updates to a ps, but only update the variables on the ps after *N* updates have been accumulated. If the number of workers is *M* and *M>N*, then this is known as dropping the last *M-N* *stale gradients*.

0 commit comments

Comments
 (0)