added readmes in the folders

Thomas Mulc · Thomas Mulc · commit c0437e89828f · 2017-10-25T22:35:42.000-07:00
diff --git a/ADAG/README.md b/ADAG/README.md
@@ -0,0 +1,3 @@
+## ADAG (Asynchronous Distributed Adaptive Gradients)
+
+Similar to DOWNPOUR expect that it uses a communications window *T* and accumulated gradients for *T* steps before sending updates to the parameter server.
diff --git a/DOWNPOUR-Easy/README.md b/DOWNPOUR-Easy/README.md
@@ -0,0 +1,3 @@
+## DOWNPOUR Easy
+
+The same as DOWNPOUR except that instead of updating variables using Adagrad locally, variables are updated using SGD.  This makes implementing the algorithm easier because you don't need to worry about finding the variables created by the local Adagrad optimizer and forcing them to be local variables.
diff --git a/DOWNPOUR/README.md b/DOWNPOUR/README.md
@@ -0,0 +1,3 @@
+## DOWNPOUR
+
+Similar to Hogwild! expect that it use Adagrad to update the local workers.  Additionally, there is a communication window which servers as a time buffer for updates to the parameter server (although the original paper set the communication window to one, which voided the need for this buffer).
diff --git a/Hogwild/README.md b/Hogwild/README.md
@@ -0,0 +1,3 @@
+## HogWild!
+
+The famous, lock-free approach to SGD.  Have a bunch of workers and parameter server, then let the workers update the variables whenever they want.
diff --git a/Multiple-GPUs-Single-Machine/README.md b/Multiple-GPUs-Single-Machine/README.md
@@ -0,0 +1,3 @@
+## Multiple GPUs Single Machine
+
+Use environment variables to manually override the available GPUs in a TensorFlow process.  There is a way to do this without using environment variables, but it's a not worth the effort (if you really need this, you can remap the avaible devices so the GPU you want to use is labeled as device 0, then set visible devices to 0).
diff --git a/SDAG/README.md b/SDAG/README.md
@@ -0,0 +1,3 @@
+## SDAG (Synchronous Distributed Adaptive Gradients)
+
+A hybrid of SSGD and Adag.  Average gradients over the communication window but apply updates to the ps variables synchronously.
diff --git a/Synchronous-SGD-different-learning-rates/README.md b/Synchronous-SGD-different-learning-rates/README.md
@@ -0,0 +1,3 @@
+## SSGD different learning rates
+
+Same as vanilla SSGD except each of the workers has its own learning rate.
diff --git a/Synchronous-SGD/README.md b/Synchronous-SGD/README.md
@@ -0,0 +1,3 @@
+## SSGD (Synchronous SGD)
+
+Have workers send their updates to a ps, but only update the variables on the ps after *N* updates have been accumulated.  If the number of workers is *M* and *M>N*, then this is known as dropping the last *M-N* *stale gradients*.

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## ADAG (Asynchronous Distributed Adaptive Gradients)`
	`2`	`+`
	`3`	`+Similar to DOWNPOUR expect that it uses a communications window T and accumulated gradients for T steps before sending updates to the parameter server.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## DOWNPOUR Easy`
	`2`	`+`
	`3`	`+The same as DOWNPOUR except that instead of updating variables using Adagrad locally, variables are updated using SGD. This makes implementing the algorithm easier because you don't need to worry about finding the variables created by the local Adagrad optimizer and forcing them to be local variables.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## DOWNPOUR`
	`2`	`+`
	`3`	`+Similar to Hogwild! expect that it use Adagrad to update the local workers. Additionally, there is a communication window which servers as a time buffer for updates to the parameter server (although the original paper set the communication window to one, which voided the need for this buffer).`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## HogWild!`
	`2`	`+`
	`3`	`+The famous, lock-free approach to SGD. Have a bunch of workers and parameter server, then let the workers update the variables whenever they want.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## Multiple GPUs Single Machine`
	`2`	`+`
	`3`	`+Use environment variables to manually override the available GPUs in a TensorFlow process. There is a way to do this without using environment variables, but it's a not worth the effort (if you really need this, you can remap the avaible devices so the GPU you want to use is labeled as device 0, then set visible devices to 0).`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## SDAG (Synchronous Distributed Adaptive Gradients)`
	`2`	`+`
	`3`	`+A hybrid of SSGD and Adag. Average gradients over the communication window but apply updates to the ps variables synchronously.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## SSGD different learning rates`
	`2`	`+`
	`3`	`+Same as vanilla SSGD except each of the workers has its own learning rate.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## SSGD (Synchronous SGD)`
	`2`	`+`
	`3`	`+Have workers send their updates to a ps, but only update the variables on the ps after N updates have been accumulated. If the number of workers is M and M>N, then this is known as dropping the last M-N stale gradients.`