-
Notifications
You must be signed in to change notification settings - Fork 44
[Bug] Low reproducibility? Limit gpus? #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You need to reduce the learning rate by 2 or 4 accordingly because the actual batch size is only 1/4 of that in our experiments. It should yield a comparable result when you adjust the optimizer setting, although we did not try them before. |
When I reproduced it, it was still based on GPU 8, as written in your mv_grounding.sh, but the result was not good. When the GPU changes, an error message will appear, and it will not run successfully. |
I removed the pretrained checkpoint from the config because I didn't know that pretrained weights were necessary and didn't see the detection branch's role on the visual grounding branch in the pipeline. |
OK. We found loading the pretrained detection checkpoint to be a helpful trick, as it is mentioned in BUTD-DETR. Look forward to your further feedback. |
Since the feature extraction pipeline can be shared by detection and visual grounding task, so we can use the 3D detection pre-trained checkpoint for weight initialization. It can be useful for better grounding performance and accelerate the training convergence at some extent. |
Close due to inactivity. Please feel free to reopen this issue if you have any further questions. |
After loading your checkpoint, the performance exceeded what your paper reported(+7.95%).
|
|
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment: [1085/1460]
sys.platform: linux
Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1551893665
GPU 0,1: NVIDIA A100-SXM4-80GB
CUDA_HOME: /mnt/lustre/share/cuda-11.0
NVCC: Cuda compilation tools, release 11.0, V11.0.221
GCC: gcc (GCC) 5.4.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:
GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=
sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_BGEMM−DUSEQNNPACK−DUSEPYTORCHQNNPACK−DUSEXNNPACK−DSYMBOLICATEMOBILEDEBUGHANDLE−DEDGEPROFILERUSEKINETO−O2−fPIC−Wno−narrowing−Wall−Wextra−Werror=return−type−Wno−missing−field−initializers−Wno−type−limits−Wno−array−bounds−Wno−unknown−pragmas−Wno−unuse
-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostic−color=always−faligned−new−Wno−unused−but−set−variable−Wno−maybe−uninitialized−fno−math−errno−fno−trapping−math−Werror=format−Werror=cast−function−type−Wno−stringop−overflow,LAPACKINFO=mkl,PERFWITHAVX=1,PERFWITHAVX2=1,PERFWITHAVX512=1,TORCHVERSION=1.12.
, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1
OpenCV: 4.9.0
MMEngine: 0.10.3
Reproduces the problem - code sample
N/A
Reproduces the problem - command or script
sh tools/mv-grounding.sh
Reproduces the problem - error message
The reproducibility results are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.2093 | 0.1840 | 0.1966 | 0.2129 | 0.0000 | 0.2073 | 0.2073 |
AP50:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.0535 | 0.0452 | 0.0581 | 0.0501 | 0.0000 | 0.0528 | 0.0528 |
But, the results in the paper are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Overall |
| results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 |
In addition, the training can only be completed when the number of GPUs is 8.
When the number of GPUs is 2 or 4, issue 30 will sometimes occur, and issue 26 will sometimes occur.
Additional information
The text was updated successfully, but these errors were encountered: