Skip to content
R. Aidan Campbell edited this page Jun 27, 2015 · 6 revisions

##This is an active project.

Before reading any of these, do a git pull and see if it fixed your issue.


###Cannot Load Checkpoint or Sample Data symptoms

luajit failing with cannot open mode r, similar to issue #46

solutions

  • double check the directory you're loading the file from
  • check that the current account has read permissions on that file

###Cannot Generate Text symptoms

luajit failing with bad argument #1 to 'size', similar to issue #44

solutions

  • if you specified primetext, verify that each character has already been seen in the training set

###Fails to start, claiming cunn/cutorch package not found symptoms

cunn/cutorch package not found

solutions

  • verify you have NVidia's CUDA runtime installed
  • install cunn or cutorch package with luarocks install cunn or luarocks install cutorch
  • execute the command as root with sudo

###CPU training executes with only one thread symptoms

only a single core of a multicore system is being heavily used

solutions

  • compile OpenBLAS with multithreading support, and compile and link Torch against it.

###loss is exploding, aborting. symptoms

train_loss value slowly decreases then suddenly increases to a much higher value

solutions

###cuda runtime error (2) : out of memory symptoms

train.lua crashes early on, with the aforementioned error

solutions

  • lower num_layers parameter, which defaults to 2 if omitted
  • lower rnn_size parameter, which defaults to 128 if omitted
  • lower batch_size parameter, which defaults to 50 if omitted
  • Linux and some Windows users can take advantage of nvidia-smi to view available memory in the GPU
Clone this wiki locally