[Bug] Low reproducibility? Limit gpus? #32

mrsempress · 2024-04-03T02:19:32Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

System environment: [1085/1460]
sys.platform: linux
Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1551893665
GPU 0,1: NVIDIA A100-SXM4-80GB
CUDA_HOME: /mnt/lustre/share/cuda-11.0
NVCC: Cuda compilation tools, release 11.0, V11.0.221
GCC: gcc (GCC) 5.4.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:

GCC 9.3

C++ Version: 201402

Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)

OpenMP 201511 (a.k.a. OpenMP 4.5)

LAPACK is enabled (usually provided by MKL)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.3

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=
sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.3.2 (built against CUDA 11.5)

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_BGEMM−DUSEQNNPACK−DUSEPYTORCHQNNPACK−DUSEXNNPACK−DSYMBOLICATEMOBILEDEBUGHANDLE−DEDGEPROFILERUSEKINETO−O2−fPIC−Wno−narrowing−Wall−Wextra−Werror=return−type−Wno−missing−field−initializers−Wno−type−limits−Wno−array−bounds−Wno−unknown−pragmas−Wno−unuse
-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostic−color=always−faligned−new−Wno−unused−but−set−variable−Wno−maybe−uninitialized−fno−math−errno−fno−trapping−math−Werror=format−Werror=cast−function−type−Wno−stringop−overflow,LAPACKINFO=mkl,PERFWITHAVX=1,PERFWITHAVX2=1,PERFWITHAVX512=1,TORCHVERSION=1.12.
, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1
OpenCV: 4.9.0
MMEngine: 0.10.3

Reproduces the problem - code sample

N/A

Reproduces the problem - command or script

sh tools/mv-grounding.sh

Reproduces the problem - error message

The reproducibility results are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.2093 | 0.1840 | 0.1966 | 0.2129 | 0.0000 | 0.2073 | 0.2073 |

AP50:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.0535 | 0.0452 | 0.0581 | 0.0501 | 0.0000 | 0.0528 | 0.0528 |

In addition, the training can only be completed when the number of GPUs is 8.
When the number of GPUs is 2 or 4, issue 30 will sometimes occur, and issue 26 will sometimes occur.

Additional information

Is there a limit to the number of GPUs, or is the problem random, and it just runs out when gpu=8?
Are the results of visual grounding reported in the paper using the default config in tools/mv_grounding.sh? Or added fcaf_coder or modified other parameters?

Tai-Wang · 2024-04-03T02:25:16Z

You need to reduce the learning rate by 2 or 4 accordingly because the actual batch size is only 1/4 of that in our experiments. It should yield a comparable result when you adjust the optimizer setting, although we did not try them before.

mrsempress · 2024-04-03T02:32:05Z

When I reproduced it, it was still based on GPU 8, as written in your mv_grounding.sh, but the result was not good. When the GPU changes, an error message will appear, and it will not run successfully.

Tai-Wang · 2024-04-03T02:35:41Z

Do you remove the pretrained checkpoint from the config? I find your result is lower than our reported performance here. You can first reproduce the performance reported in our repo because we re-split the training/val/test set for the challenge as explained here.

mrsempress · 2024-04-03T02:45:02Z

I removed the pretrained checkpoint from the config because I didn't know that pretrained weights were necessary and didn't see the detection branch's role on the visual grounding branch in the pipeline.
I will try again to get the pre-training weights and redo the visual grounding task.
Thank you for your reply.

Tai-Wang · 2024-04-03T03:14:13Z

OK. We found loading the pretrained detection checkpoint to be a helpful trick, as it is mentioned in BUTD-DETR. Look forward to your further feedback.

ZCMax · 2024-04-03T07:32:45Z

I removed the pretrained checkpoint from the config because I didn't know that pretrained weights were necessary and didn't see the detection branch's role on the visual grounding branch in the pipeline. I will try again to get the pre-training weights and redo the visual grounding task. Thank you for your reply.

Since the feature extraction pipeline can be shared by detection and visual grounding task, so we can use the 3D detection pre-trained checkpoint for weight initialization. It can be useful for better grounding performance and accelerate the training convergence at some extent.

Tai-Wang · 2024-04-09T05:01:32Z

Close due to inactivity. Please feel free to reopen this issue if you have any further questions.

mrsempress · 2024-04-11T04:49:31Z

After loading your checkpoint, the performance exceeded what your paper reported(+7.95%).

The results in the paper are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Overall |
| results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 |

The reproducibility results are: (with load your checkpoint)
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.3489 | 0.3018|0.3567|0.3277|0.0000|0.3377|0.3377|

AP50:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.1168|0.0925|0.1127|0.1159|0.0000|0.1148|0.1148|

Another question is why the result of overall is the same as the result of multi.
In addition, you mentioned that using detection checkpoint is important. In my experiment, it increased by 13.04%. If the grounding checkpoint is also used as the initialization of detection, will there be an improvement? If we keep looping initialization, can we get better results?

ZCMax · 2024-04-11T04:56:48Z

After loading your checkpoint, the performance exceeded what your paper reported(+7.95%).
The results in the paper are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Overall |
| results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 |

The reproducibility results are: (with load your checkpoint)
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.3489 | 0.3018|0.3567|0.3277|0.0000|0.3377|0.3377|

AP50:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.1168|0.0925|0.1127|0.1159|0.0000|0.1148|0.1148|
Another question is why the result of overall is the same as the result of multi.

In addition, you mentioned that using detection checkpoint is important. In my experiment, it increased by 13.04%. If the grounding checkpoint is also used as the initialization of detection, will there be an improvement? If we keep looping initialization, can we get better results?

Since all the prompts belong to the multiple type, the overall performance is exactly the same as multiple.
Actually, an exploration can be joint grounding and detection training, illustrated in BUTD-DETR, reformulating the detection task to the category prompt grounding task. It may boost both the detection and grounding performance at the same time.

Tai-Wang closed this as completed Apr 9, 2024

EricLee0224 mentioned this issue Apr 15, 2024

[Bug] UPD - ValueError: Plane vertices are not coplanar. #40

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Low reproducibility? Limit gpus? #32

[Bug] Low reproducibility? Limit gpus? #32

mrsempress commented Apr 3, 2024

Tai-Wang commented Apr 3, 2024 •

edited

Loading

mrsempress commented Apr 3, 2024

Tai-Wang commented Apr 3, 2024

mrsempress commented Apr 3, 2024

Tai-Wang commented Apr 3, 2024

ZCMax commented Apr 3, 2024

Tai-Wang commented Apr 9, 2024

mrsempress commented Apr 11, 2024

ZCMax commented Apr 11, 2024

[Bug] Low reproducibility? Limit gpus? #32

[Bug] Low reproducibility? Limit gpus? #32

Comments

mrsempress commented Apr 3, 2024

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Tai-Wang commented Apr 3, 2024 • edited Loading

mrsempress commented Apr 3, 2024

Tai-Wang commented Apr 3, 2024

mrsempress commented Apr 3, 2024

Tai-Wang commented Apr 3, 2024

ZCMax commented Apr 3, 2024

Tai-Wang commented Apr 9, 2024

mrsempress commented Apr 11, 2024

ZCMax commented Apr 11, 2024

Tai-Wang commented Apr 3, 2024 •

edited

Loading