HI,imelekhov.I HAVE MEET SOME TRAIN PROBLEM FOR input_ and gradOutput_ shapes do not match #3

TheBloodthirster · 2019-08-04T10:09:27Z

When i want to train the net for :
th main.lua -weights <path/to/downloaded_weights/model_snapshot_7scenes.t7> -dataset_src_path </path/to/7Scenes>
without -do_evaluation I have meet some problem.

Here is error:

{
val_batch_size : 40
beta1 : 0.9
do_evaluation : false
use_dropout : false
dataset_src_path : "/data/code/camera-relocalisation/7Scenes"
gamma : 0.001
image_size : 224
epoch_number : 1
weights : "/data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7"
train_batch_size : 64
validation_dataset_size : 10402
max_epoch : 250
dataset_name : "7-Scenes"
nGPU : 1
momentum : 0.9
logs : "./logs/7scenes.log"
beta : 1
manualSeed : 333
learning_rate : 0.1
beta2 : 0.999
model_zoo_path : "./pretrained_models"
precomputed_data_path : "./data"
results_filename : "./results/7scenes_res.bin"
snapshot_dir : "./snapshots"
GPU : 1
weight_decay : 1e-05
power : 0.5
training_dataset_size : 39999
}
this is a test for load_training_data
==> Training GT labels have been loaded successfully
==> Validation GT labels have been loaded successfully
==> loading model from pretained weights from file: /data/code/camera-relocalisation/downloaded_weights/model_snapshot_7scenes.t7
==> configuring optimizer
==> number of batches: 624
==> learning rate: 0.1
==> Number of parameters in the model: 22350215
==> online epoch # 1 [batchSize = 64]
==> time taken to randomize input training data: 2.7921199798584 ms
/torch/install/bin/luajit: /torch/install/share/lua/5.1/nn/Container.lua:67: ...........] ETA: 0ms | Step: 0ms
In 1 module of nn.Sequential:
In 1 module of nn.ParallelTable:
In 2 module of nn.Sequential:
/torch/install/share/lua/5.1/nn/THNN.lua:110: input_ and gradOutput_ shapes do not match: input_ [2 x 64 x 112 x 112], gradOutput_ [64 x 64 x 112 x 112] at /torch/extra/cunn/lib/THCUNN/generic/BatchNormalization.cu:74
stack traceback:
[C]: in function 'v'
/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_backward'
/torch/install/share/lua/5.1/nn/BatchNormalization.lua:154: in function </torch/install/share/lua/5.1/nn/BatchNormalization.lua:140>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:70: in function </torch/install/share/lua/5.1/nn/Sequential.lua:63>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/ParallelTable.lua:27: in function 'accGradParameters'
/torch/install/share/lua/5.1/nn/Module.lua:32: in function </torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward'
/data/code/camera-relocalisation/cnn_part/train.lua:68: in function 'opfunc'
/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
/data/code/camera-relocalisation/cnn_part/train.lua:72: in function 'train'
main.lua:97: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50

and i think the problem in local in here:
for t,v in ipairs(indices) do
xlua.progress(t, #indices)

    local mini_batch_info = make_training_minibatch(v)
    local mini_batch_data = mini_batch_info.data:cuda()
    local orientation_gt = mini_batch_info.quaternion_labels:cuda()
    local translation_gt = mini_batch_info.translation_labels:cuda()
    
    cutorch.synchronize()
    collectgarbage()
    
    feval = function(x)
        if x ~= parameters then parameters:copy(x) end
        model:zeroGradParameters()

        local outputs = model:forward({mini_batch_data[{{}, 1, {}, {}, {}}], mini_batch_data[{{}, 2, {}, {}, {}}]})
        local err = criterion:forward(outputs, {translation_gt, orientation_gt})
        meter_train_t:add(criterion.weights[1] * criterion.criterions[1].output)
        meter_train_q:add(criterion.weights[2] * criterion.criterions[2].output)
        
        local df_do = criterion:backward(outputs, {translation_gt, orientation_gt})
        model:backward(mini_batch_data, df_do)
        
        return err, gradParameters
    end
    optim.adam(feval, parameters, optimState)

============================================
especial when i note optim.adam(feval, parameters, optimState) ,the training can work well.

i don't know what's going on,could you please help me ?
THANKS ADVANCED!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HI,imelekhov.I HAVE MEET SOME TRAIN PROBLEM FOR input_ and gradOutput_ shapes do not match #3

HI,imelekhov.I HAVE MEET SOME TRAIN PROBLEM FOR input_ and gradOutput_ shapes do not match #3

TheBloodthirster commented Aug 4, 2019 •

edited

Loading

HI,imelekhov.I HAVE MEET SOME TRAIN PROBLEM FOR input_ and gradOutput_ shapes do not match #3

HI,imelekhov.I HAVE MEET SOME TRAIN PROBLEM FOR input_ and gradOutput_ shapes do not match #3

Comments

TheBloodthirster commented Aug 4, 2019 • edited Loading

Here is error:

TheBloodthirster commented Aug 4, 2019 •

edited

Loading