-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] UPD - ValueError: Plane vertices are not coplanar. #40
Comments
I completely understand your feeling with this situation. Based on my understanding, this issue often arises when one of the predicted boxes has a side length that is too short. Here are some possible solutions:
I hope this helps! |
I met the same issue when training 3dv-grounding using A6000, when computing the cost of the Hungarian assignment of training. I think it's inappropriate to simply modify eps in
respectively. So setting as follows fixes my issue:
Basically, it's similar to the idea that just remove the '_check_coplanar' and '_check_nonzero' checks as suggested above. But I'm not sure if there are any serious consequences to extending this limit |
NB, it really works. While some checks have been removed, the val/test results are not significantly different from the official reports. This seems to be an acceptable solution. |
Hi @EricLee0224 , can you share your reproduced result? The val/test result I got is lower than the official one. |
See, thank you! similar results. |
System environment:
当我按照修改 file:site-packages/pytorch3d/ops/iou_box3d.py_check_coplanar(boxes1, 1e-2) 我尝试remove这个特判: |
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment:
sys.platform: linux
Python: 3.8.19 | packaged by conda-forge | (default, Mar 20 2024, 12:47:35) [GCC 12.3.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 545726448
GPU 0,1,2,3,4,5,6,7: NVIDIA RTX A6000
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.3, V11.3.58
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0
OpenCV: 4.9.0
MMEngine: 0.10.3
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 545726448
Distributed launcher: pytorch
Distributed training: True
GPU number: 8
Reproduces the problem - code sample
Reproduces the problem - command or script
3D mv-Det:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 tools/train.py configs/detection/mv-det3d_8xb4_embodiedscan-3d-284class-9dof.py --work-dir=work_dirs/mv-3ddet --launcher="pytorch"
3D mv-VG:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 tools/train.py configs/grounding/mv-grounding_8xb12_embodiedscan-vg-9dof.py --work-dir=work_dirs/mv-3dground --launcher="pytorch"
Reproduces the problem - error message
04/15 13:56:37 - mmengine - INFO - Checkpoints will be saved to /data/zyp/code/EmbodiedScan/work_dirs/mv-3dground.
/data/zyp/code/EmbodiedScan/embodiedscan/models/layers/fusion_layers/point_fusion.py:48: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone(
).detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
pcd_rotate_mat = (torch.tensor(img_meta['pcd_rotation'],
/data/zyp/code/EmbodiedScan/embodiedscan/models/layers/fusion_layers/point_fusion.py:48: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone(
).detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
pcd_rotate_mat = (torch.tensor(img_meta['pcd_rotation'],
/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
warnings.warn(f'position encoding of key is'
Traceback (most recent call last):
File "tools/train.py", line 133, in
main()
File "tools/train.py", line 129, in main
runner.train()
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
outputs = self.runner.model.train_step(
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
losses = self._run_forward(data, mode='loss')
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
results = self(**data, mode=mode)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 666, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 507, in loss
losses = self.bbox_head.loss(**head_inputs_dict,
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 637, in loss
losses = self.loss_by_feat(*loss_inputs)
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 668, in loss_by_feat
losses_cls, losses_bbox = multi_apply(
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 711, in loss_by_feat_single
cls_reg_targets = self.get_targets(cls_scores_list,
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 258, in get_targets
pos_inds_list, neg_inds_list) = multi_apply(self._get_targets_single,
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 398, in _get_targets_single
assign_result = self.assigner.assign(
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/task_modules/assigners/hungarian_assigner.py", line 113, in assign
cost = match_cost(pred_instances=pred_instances_3d,
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/losses/match_cost.py", line 108, in call
overlaps = pred_bboxes.overlaps(pred_bboxes, gt_bboxes)
File "/data/zyp/code/EmbodiedScan/embodiedscan/structures/bbox_3d/euler_box3d.py", line 134, in overlaps
_, iou3d = box3d_overlap(corners1, corners2, eps=eps)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/pytorch3d/ops/iou_box3d.py", line 160, in box3d_overlap
_check_coplanar(boxes2, eps)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/pytorch3d/ops/iou_box3d.py", line 66, in _check_coplanar
raise ValueError(msg)
ValueError: Plane vertices are not coplanar
Additional information
I can run the 3D mv-det task very smoothly in both training and testing. However, when I run the 3D mv-VG task in the same environment with 8*A6000 (48G), it always encounters a ValueError: Plane vertices are not coplanar in the first epoch.
I have checked the related issues #22, #32, #30, facebookresearch/pytorch3d/issues/992, and facebookresearch/pytorch3d/issues/1771.
I have also tried the following solutions:
However, none of these solutions have worked so far. Could anyone please share how to solve this issue or provide a successful environment setup? Will the team look into this matter? Many thanks.
The text was updated successfully, but these errors were encountered: