[Bug] UPD - ValueError: Plane vertices are not coplanar. 

### Prerequisite

- [X] I have searched [Issues](https://github.com/open-mmlab/mmdetection3d/issues) and [Discussions](https://github.com/open-mmlab/mmdetection3d/discussions) but cannot get the expected help.
- [X] I have read the [FAQ documentation](https://mmdetection3d.readthedocs.io/en/latest/notes/faq.html) but cannot get the expected help.
- [x] The bug has not been fixed in the [latest version (dev-1.x)](https://github.com/open-mmlab/mmdetection3d/tree/dev-1.x) or [latest version (dev-1.0)](https://github.com/open-mmlab/mmdetection3d/tree/dev-1.0).

### Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

### Branch

main branch https://github.com/open-mmlab/mmdetection3d

### Environment

System environment:
    sys.platform: linux
    Python: 3.8.19 | packaged by conda-forge | (default, Mar 20 2024, 12:47:35) [GCC 12.3.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 545726448
    GPU 0,1,2,3,4,5,6,7: NVIDIA RTX A6000
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.3, V11.3.58
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
    PyTorch: 1.11.0
    PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.12.0
    OpenCV: 4.9.0
    MMEngine: 0.10.3

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 545726448
    Distributed launcher: pytorch
    Distributed training: True
    GPU number: 8

### Reproduces the problem - code sample

-

### Reproduces the problem - command or script

3D mv-Det:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 tools/train.py configs/detection/mv-det3d_8xb4_embodiedscan-3d-284class-9dof.py --work-dir=work_dirs/mv-3ddet --launcher="pytorch"

3D mv-VG:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 tools/train.py configs/grounding/mv-grounding_8xb12_embodiedscan-vg-9dof.py --work-dir=work_dirs/mv-3dground --launcher="pytorch"

### Reproduces the problem - error message

04/15 13:56:37 - mmengine - INFO - Checkpoints will be saved to /data/zyp/code/EmbodiedScan/work_dirs/mv-3dground. 
                                                      
/data/zyp/code/EmbodiedScan/embodiedscan/models/layers/fusion_layers/point_fusion.py:48: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone(
).detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).                                                                          
  pcd_rotate_mat = (torch.tensor(img_meta['pcd_rotation'],                                                                                                                         
/data/zyp/code/EmbodiedScan/embodiedscan/models/layers/fusion_layers/point_fusion.py:48: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone(
).detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).                                                                          
  pcd_rotate_mat = (torch.tensor(img_meta['pcd_rotation'],                                                                                                                         
/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.      
  warnings.warn(f'position encoding of key is'                                                                                                                                     
Traceback (most recent call last):                                                                                                                                                 
  File "tools/train.py", line 133, in <module>                                                                                                                                     
    main()                                                                                                                                                                         
  File "tools/train.py", line 129, in main                                                                                                                                         
    runner.train()                                                                                                                                                                 
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train                                                         
    model = self.train_loop.run()  # type: ignore                                        
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run                                                              
    self.run_epoch()
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch                                                       
    self.run_iter(idx, data_batch)
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter                                                        
    outputs = self.runner.model.train_step(
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step                                        
    losses = self._run_forward(data, mode='loss')                                        
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward                                      
    results = self(**data, mode=mode)
File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                                                   
    return forward_call(*input, **kwargs)
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward                                                 
    output = self.module(*inputs[0], **kwargs[0])                                        
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl                                                   
    return forward_call(*input, **kwargs)
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 666, in forward                                                             
    return self.loss(inputs, data_samples, **kwargs)                                     
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 507, in loss                                                                
    losses = self.bbox_head.loss(**head_inputs_dict,                                     
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 637, in loss                                                                          
    losses = self.loss_by_feat(*loss_inputs)                                             
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 668, in loss_by_feat                                                                  
    losses_cls, losses_bbox = multi_apply(
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply                                                   
    return tuple(map(list, zip(*map_results)))                                           
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 711, in loss_by_feat_single                                                           
    cls_reg_targets = self.get_targets(cls_scores_list,                                  
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 258, in get_targets                                                                   
    pos_inds_list, neg_inds_list) = multi_apply(self._get_targets_single,                
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply                                                   
    return tuple(map(list, zip(*map_results)))
File "/data/zyp/code/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 398, in _get_targets_single                                                           
    assign_result = self.assigner.assign(
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/task_modules/assigners/hungarian_assigner.py", line 113, in assign                                                         
    cost = match_cost(pred_instances=pred_instances_3d,                                  
  File "/data/zyp/code/EmbodiedScan/embodiedscan/models/losses/match_cost.py", line 108, in __call__                                                                               
    overlaps = pred_bboxes.overlaps(pred_bboxes, gt_bboxes)                              
  File "/data/zyp/code/EmbodiedScan/embodiedscan/structures/bbox_3d/euler_box3d.py", line 134, in overlaps                                                                         
    _, iou3d = box3d_overlap(corners1, corners2, eps=eps)                                
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/pytorch3d/ops/iou_box3d.py", line 160, in box3d_overlap                                                 
    _check_coplanar(boxes2, eps)
  File "/data/zyp/miniconda3/envs/embodiedscan/lib/python3.8/site-packages/pytorch3d/ops/iou_box3d.py", line 66, in _check_coplanar                                                
    raise ValueError(msg)
ValueError: Plane vertices are not coplanar

### Additional information

I can run the 3D mv-det task very smoothly in both training and testing. However, when I run the 3D mv-VG task in the same environment with 8*A6000 (48G), it always encounters a ValueError: Plane vertices are not coplanar in the first epoch.

I have checked the related issues #22, #32, #30, [facebookresearch/pytorch3d/issues/992](https://github.com/facebookresearch/pytorch3d/issues/992), and facebookresearch/pytorch3d/issues/1771.

I have also tried the following solutions:
1. Modifying eps in box3d_overlap with values like 1e-2, 1e-3, 1e-4, and 1e-5.
2. Changing the learning rate (lr) in the training script to values like 5e-2 and 5e-4.
3. Training with detection checkpoint and without detection checkpoint.
4. Using 2xA6000, 4xA6000, and 8xA6000.
5. Using --resume and --resume auto

However, none of these solutions have worked so far. Could anyone please share how to solve this issue or provide a successful environment setup? Will the team look into this matter? Many thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] UPD - ValueError: Plane vertices are not coplanar. #40

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] UPD - ValueError: Plane vertices are not coplanar. #40

Description

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions