Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy task - Learning on input of length 1 #4

Open
tristandeleu opened this issue Sep 21, 2015 · 2 comments
Open

Copy task - Learning on input of length 1 #4

tristandeleu opened this issue Sep 21, 2015 · 2 comments
Labels

Comments

@tristandeleu
Copy link
Collaborator

As suggested by @adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs:
copy-1

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

  • The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two
    copy-10-partial
  • The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on.
    copy-10-repeat
Parameters of the experiment
  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift
  • Initialization: Uniform Glorot for any weight matrix + Memory init, Zeros for any bias + Hidden state init, EquiProba for weights init (Read & Write)
Learning curve

Gray: Cost function, Red: Moving average of the cost function over 500 iterations
copy-learning-curve

@adrienball
Copy link

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a
écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an
experiment to learn the NTM on only length one inputs to see if it could
already learn such a simple behavior (even if it overfits). The NTM
successfully recovered the length one inputs:
[image: copy-1]
https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at
recovering the whole sequences (as expected, due to the lack of variety in
the input lengths), but generally succeeded to remember the first vector.
However some interesting patterns emerged:

Parameters of the experiment:

  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other
    parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid
    for [gate, dense_output], softmax for shift


Reply to this email directly or view it on GitHub
#4.

Adrien Ball+33 (0) 6 70 87 57 78
<%2B33%20%280%29%206%2051%2053%2038%2027>[email protected]
[email protected]: adrien.balltwitter: @adrien_ball

@maelp
Copy link

maelp commented Sep 22, 2015

Nice!

-- 
Dr Maël Primet, PhD
+33 (0) 6 51 53 38 27
[email protected]
skype: maelpr
twitter: @mael_p

On 22 Sep 2015 at 09:08:17, Adrien Ball ([email protected]) wrote:

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a
écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an
experiment to learn the NTM on only length one inputs to see if it could
already learn such a simple behavior (even if it overfits). The NTM
successfully recovered the length one inputs:
[image: copy-1]
https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at
recovering the whole sequences (as expected, due to the lack of variety in
the input lengths), but generally succeeded to remember the first vector.
However some interesting patterns emerged:

Parameters of the experiment:

  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other
    parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid
    for [gate, dense_output], softmax for shift


Reply to this email directly or view it on GitHub
#4.

Adrien Ball+33 (0) 6 70 87 57 78
<%2B33%20%280%29%206%2051%2053%2038%2027>[email protected]
[email protected]: adrien.balltwitter: @adrien_ball


Reply to this email directly or view it on GitHub.

This was referenced Sep 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants