Copy task - Learning on input of length 1 #4

tristandeleu · 2015-09-21T17:24:11Z

As suggested by @adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs:

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two
The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on.

Parameters of the experiment

NTM layer with FeedForward controller + 1 read head + 1 write head
Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift
Initialization: Uniform Glorot for any weight matrix + Memory init, Zeros for any bias + Hidden state init, EquiProba for weights init (Read & Write)

Learning curve

Gray: Cost function, Red: Moving average of the cost function over 500 iterations

The text was updated successfully, but these errors were encountered:

adrienball · 2015-09-22T07:08:16Z

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a
écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an
experiment to learn the NTM on only length one inputs to see if it could
already learn such a simple behavior (even if it overfits). The NTM
successfully recovered the length one inputs:
[image: copy-1]
https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at
recovering the whole sequences (as expected, due to the lack of variety in
the input lengths), but generally succeeded to remember the first vector.
However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even
though it had never seen any inputs larger than two [image:
copy-10-partial]
https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png

The NTM sometimes repeated this first vector (with some "noise")
multiple times. This is an interesting property that has come up frequently
enough to be worth investigating on. [image: copy-10-repeat]
https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

NTM layer with FeedForward controller + 1 read head + 1 write head

Update rule: Graves' RMSprop with learning_rate=1e-3 (other
parameters left as is from his previous paper)

Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid
for [gate, dense_output], softmax for shift

—
Reply to this email directly or view it on GitHub
#4.

Adrien Ball+33 (0) 6 70 87 57 78
<%2B33%20%280%29%206%2051%2053%2038%2027>[email protected]
[email protected]: adrien.balltwitter: @adrien_ball

maelp · 2015-09-22T08:11:29Z

Nice!

--
Dr Maël Primet, PhD
+33 (0) 6 51 53 38 27
[email protected]
skype: maelpr
twitter: @mael_p

On 22 Sep 2015 at 09:08:17, Adrien Ball ([email protected]) wrote:

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a
écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an
experiment to learn the NTM on only length one inputs to see if it could
already learn such a simple behavior (even if it overfits). The NTM
successfully recovered the length one inputs:
[image: copy-1]
https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at
recovering the whole sequences (as expected, due to the lack of variety in
the input lengths), but generally succeeded to remember the first vector.
However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even
though it had never seen any inputs larger than two [image:
copy-10-partial]
https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png

The NTM sometimes repeated this first vector (with some "noise")
multiple times. This is an interesting property that has come up frequently
enough to be worth investigating on. [image: copy-10-repeat]
https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

NTM layer with FeedForward controller + 1 read head + 1 write head

Update rule: Graves' RMSprop with learning_rate=1e-3 (other
parameters left as is from his previous paper)

Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid
for [gate, dense_output], softmax for shift

—
Reply to this email directly or view it on GitHub
#4.

Adrien Ball+33 (0) 6 70 87 57 78
<%2B33%20%280%29%206%2051%2053%2038%2027>[email protected]
[email protected]: adrien.balltwitter: @adrien_ball
—
Reply to this email directly or view it on GitHub.

tristandeleu added the progress label Sep 21, 2015

This was referenced Sep 22, 2015

Copy task #6

Open

Repeat Copy task #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy task - Learning on input of length 1 #4

Copy task - Learning on input of length 1 #4

tristandeleu commented Sep 21, 2015

adrienball commented Sep 22, 2015

maelp commented Sep 22, 2015

Copy task - Learning on input of length 1 #4

Copy task - Learning on input of length 1 #4

Comments

tristandeleu commented Sep 21, 2015

Parameters of the experiment

Learning curve

adrienball commented Sep 22, 2015

maelp commented Sep 22, 2015