You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When i want to train the net for :
th main.lua -weights <path/to/downloaded_weights/model_snapshot_7scenes.t7> -dataset_src_path </path/to/7Scenes>
without -do_evaluation I have meet some problem.
Here is error:
{
val_batch_size : 40
beta1 : 0.9
do_evaluation : false
use_dropout : false
dataset_src_path : "/data/code/camera-relocalisation/7Scenes"
gamma : 0.001
image_size : 224
epoch_number : 1
weights : "/data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7"
train_batch_size : 64
validation_dataset_size : 10402
max_epoch : 250
dataset_name : "7-Scenes"
nGPU : 1
momentum : 0.9
logs : "./logs/7scenes.log"
beta : 1
manualSeed : 333
learning_rate : 0.1
beta2 : 0.999
model_zoo_path : "./pretrained_models"
precomputed_data_path : "./data"
results_filename : "./results/7scenes_res.bin"
snapshot_dir : "./snapshots"
GPU : 1
weight_decay : 1e-05
power : 0.5
training_dataset_size : 39999
}
this is a test for load_training_data
==> Training GT labels have been loaded successfully
==> Validation GT labels have been loaded successfully
==> loading model from pretained weights from file: /data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7
==> configuring optimizer
==> number of batches: 624
==> learning rate: 0.1
==> Number of parameters in the model: 22350215
==> online epoch # 1 [batchSize = 64]
==> time taken to randomize input training data: 2.7921199798584 ms
/torch/install/bin/luajit: /torch/install/share/lua/5.1/nn/Container.lua:67: ...........] ETA: 0ms | Step: 0ms
In 1 module of nn.Sequential:
In 1 module of nn.ParallelTable:
In 2 module of nn.Sequential:
/torch/install/share/lua/5.1/nn/THNN.lua:110: input_ and gradOutput_ shapes do not match: input_ [2 x 64 x 112 x 112], gradOutput_ [64 x 64 x 112 x 112] at /torch/extra/cunn/lib/THCUNN/generic/BatchNormalization.cu:74
stack traceback:
[C]: in function 'v'
/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_backward'
/torch/install/share/lua/5.1/nn/BatchNormalization.lua:154: in function </torch/install/share/lua/5.1/nn/BatchNormalization.lua:140>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:70: in function </torch/install/share/lua/5.1/nn/Sequential.lua:63>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/ParallelTable.lua:27: in function 'accGradParameters'
/torch/install/share/lua/5.1/nn/Module.lua:32: in function </torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
and i think the problem in local in here:
for t,v in ipairs(indices) do
xlua.progress(t, #indices)
local mini_batch_info = make_training_minibatch(v)
local mini_batch_data = mini_batch_info.data:cuda()
local orientation_gt = mini_batch_info.quaternion_labels:cuda()
local translation_gt = mini_batch_info.translation_labels:cuda()
cutorch.synchronize()
collectgarbage()
feval = function(x)
if x ~= parameters then parameters:copy(x) end
model:zeroGradParameters()
local outputs = model:forward({mini_batch_data[{{}, 1, {}, {}, {}}], mini_batch_data[{{}, 2, {}, {}, {}}]})
local err = criterion:forward(outputs, {translation_gt, orientation_gt})
meter_train_t:add(criterion.weights[1] * criterion.criterions[1].output)
meter_train_q:add(criterion.weights[2] * criterion.criterions[2].output)
local df_do = criterion:backward(outputs, {translation_gt, orientation_gt})
model:backward(mini_batch_data, df_do)
return err, gradParameters
end
optim.adam(feval, parameters, optimState)
============================================
especial when i note optim.adam(feval, parameters, optimState) ,the training can work well.
i don't know what's going on,could you please help me ?
THANKS ADVANCED!
The text was updated successfully, but these errors were encountered:
When i want to train the net for :
th main.lua -weights <path/to/downloaded_weights/model_snapshot_7scenes.t7> -dataset_src_path </path/to/7Scenes>
without -do_evaluation I have meet some problem.
Here is error:
{
val_batch_size : 40
beta1 : 0.9
do_evaluation : false
use_dropout : false
dataset_src_path : "/data/code/camera-relocalisation/7Scenes"
gamma : 0.001
image_size : 224
epoch_number : 1
weights : "/data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7"
train_batch_size : 64
validation_dataset_size : 10402
max_epoch : 250
dataset_name : "7-Scenes"
nGPU : 1
momentum : 0.9
logs : "./logs/7scenes.log"
beta : 1
manualSeed : 333
learning_rate : 0.1
beta2 : 0.999
model_zoo_path : "./pretrained_models"
precomputed_data_path : "./data"
results_filename : "./results/7scenes_res.bin"
snapshot_dir : "./snapshots"
GPU : 1
weight_decay : 1e-05
power : 0.5
training_dataset_size : 39999
}
this is a test for load_training_data
==> Training GT labels have been loaded successfully
==> Validation GT labels have been loaded successfully
==> loading model from pretained weights from file: /data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7
==> configuring optimizer
==> number of batches: 624
==> learning rate: 0.1
==> Number of parameters in the model: 22350215
==> online epoch # 1 [batchSize = 64]
==> time taken to randomize input training data: 2.7921199798584 ms
/torch/install/bin/luajit: /torch/install/share/lua/5.1/nn/Container.lua:67: ...........] ETA: 0ms | Step: 0ms
In 1 module of nn.Sequential:
In 1 module of nn.ParallelTable:
In 2 module of nn.Sequential:
/torch/install/share/lua/5.1/nn/THNN.lua:110: input_ and gradOutput_ shapes do not match: input_ [2 x 64 x 112 x 112], gradOutput_ [64 x 64 x 112 x 112] at /torch/extra/cunn/lib/THCUNN/generic/BatchNormalization.cu:74
stack traceback:
[C]: in function 'v'
/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_backward'
/torch/install/share/lua/5.1/nn/BatchNormalization.lua:154: in function </torch/install/share/lua/5.1/nn/BatchNormalization.lua:140>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:70: in function </torch/install/share/lua/5.1/nn/Sequential.lua:63>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/ParallelTable.lua:27: in function 'accGradParameters'
/torch/install/share/lua/5.1/nn/Module.lua:32: in function </torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
and i think the problem in local in here:
for t,v in ipairs(indices) do
xlua.progress(t, #indices)
============================================
especial when i note optim.adam(feval, parameters, optimState) ,the training can work well.
i don't know what's going on,could you please help me ?
THANKS ADVANCED!
The text was updated successfully, but these errors were encountered: