Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segmentation Fault during solver.solve in Python run #194

Open
nicolaoe opened this issue Sep 30, 2024 · 9 comments
Open

[BUG] Segmentation Fault during solver.solve in Python run #194

nicolaoe opened this issue Sep 30, 2024 · 9 comments

Comments

@nicolaoe
Copy link

nicolaoe commented Sep 30, 2024

While running a simple Python script, the solver crashes with a segmentation fault. It happens using the teaserpp_example.py and teaser_python_ply.py too. At installation, all ctest passed successfully. Reinstalling from the develop branch did not solve the problem, and running the script with OMP_NUM_THREADS=12 (or even 1) still produces a segmentation fault at the solver.solve step.

The Python script is run with Python3 on Ubuntu 22.04.5 from a virtual environment with numpy==2.1.1, open3d==0.18.0, teaserpp_python==1.0.0. The machine has 24 cores and 135 GB RAM, so memory overload should not be the issue. An example of command that produces a segmentation fault is (run from the folder /TEASER-plusplus-develop/python/teaserpp_python/): OMP_NUM_THREADS="12" python3 ./teaserpp_example.py
image

I would appreciate it if you could give me an idea of what could go wrong and how to solve this issue.

Edit:
After further debugging, I found that in the example teaser_python_3dsmooth.py, the segmentation fault was triggered at the line 267: frag1.data = frag1_desc.T
The problem here might be writing to the o3d.pipelines.registration.Feature()? With Python3.10.12, it happens with open3d0.18.0, open3d0.17.0 and open3d0.16.0. I could not downgrade open3d further. Did someone find a solution for this?

@jingnanshi
Copy link
Member

I would suggest try cloning the data and write it to frag1.data

@Cuberkk
Copy link

Cuberkk commented Jan 20, 2025

Hi guys,
I encountered the same problem. And it looks like the problems occur when executing the function execute_teaser_global_registration, more specifically line 236: teaserpp_solver.solve(source, target).
I tried both on my mac and a linux computer and encountered the same problem. I would appreciate @nicolaoe or @jingnanshi can share the solution if the problem is solved!!

@jingnanshi
Copy link
Member

@Cuberkk do you mind testing a bit on your side? See whether the same behaviors occur with small number of points or high number of points, etc..

@eleboss
Copy link

eleboss commented Jan 27, 2025

I encounter the same problem. For python 3.6 everything works well, for python 3.9 the seg fault appear.

@Cuberkk
Copy link

Cuberkk commented Jan 27, 2025

Hi guys, I played around with building the environment, and I was able to run the examples in the repo in my WSL (Ubuntu 16.04) with a conda environment of Python 3.6. In this environment, the open3d is installed straight through the pip install open3d. I can also run teaser on a Linux 22.04 OS with a conda environment of Python 3.10. But when I use the same procedure on another computer, I still encounter the segmentation problem. So I guess there are some weird dependencies issues, but for Python 3.6, what I do is follow the Reproduce the GIF Above procedure in the git repo while the open3d is installed through the command: pip install open3d. I hope this will help you guys!

@jingnanshi
Copy link
Member

@Cuberkk thanks! For the segmentation fault case, can you try importing teaser at the end of all other imports and try again? Thanks!

@eleboss
Copy link

eleboss commented Jan 29, 2025

hey guys, I have done some debuggings.

Following the minimum python example in readme, I encounter the seg fault bug:

sudo apt install cmake libeigen3-dev libboost-all-dev
conda create -n teaser_test python=3.10 numpy
conda activate teaser_test
pip install open3d
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake -DTEASERPP_PYTHON_VERSION=3.10 .. && make teaserpp_python
cd python && pip install .
cd ../.. && cd examples/teaser_python_ply
python teaser_python_ply.py

Then I tried uninstall open3d and simply using python3.10 with teaser-pp, the seg fault bug disappeared.

Hope this hint helps.

Shijie

@doggydoggy0101
Copy link

doggydoggy0101 commented Feb 3, 2025

Hi, I was able to reproduce the problem by

'''
sudo apt install cmake libeigen3-dev libboost-all-dev
conda create -n reg python=3.10 numpy -y
conda activate reg
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../../.. && python testTeaser.py
'''
# File name testTeaser.py
import numpy as np
import teaserpp_python

# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve(test1, test2)

some testing I don't think it an open3d problem. I ran some debug
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff539d079 in pybind11::detail::type_caster<Eigen::Matrix<double, 3, -1, 0, 3, -1>, void>::load (this=0x7fffffffb3c0, src=..., convert=true) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/eigen/matrix.h:327
#2  0x00007ffff5391533 in pybind11::detail::argument_loader<teaser::RobustRegistrationSolver*, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&, Eigen::Matrix<double, 3, -1, 0, 3, -1> const&>::load_impl_sequence<0ul, 1ul, 2ul> (this=0x7fffffffb3b0, 
    call=...) at /home/doggy/code/TEASER-plusplus/build/pybind11-src/include/pybind11/cast.h:1469

which looks like there are some issues between numpy array and Eigen matrix. I tried using EigenDRef to wrap it

// original wrapper
.def("solve", py::overload_cast<const Eigen::Matrix<double, 3, Eigen::Dynamic>&,
                                const Eigen::Matrix<double, 3, Eigen::Dynamic>&>(
                  &teaser::RobustRegistrationSolver::solve))
// EigenDRef binds functions that take Eigen::Ref parameters
.def("solve_debug", [](teaser::RobustRegistrationSolver &self,
                       py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd1,
                       py::EigenDRef<const Eigen::Matrix<double, 3, Eigen::Dynamic>> pcd2) {
                      return self.solve(pcd1, pcd2);
                      })

with the script

# File name testTeaser.py
import numpy as np
import teaserpp_python

# random data
test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve_debug(test1, test2) 
print("[DEBUG] EigenDRef works", end="\n\n")

solver.solve(test1, test2)
❯ python testTeaser.py
Starting scale solver (only selecting inliers if scale estimation has been disabled).
Scale estimation complete.
Max core number: 4
Num vertices: 101
Max Clique of scale estimation inliers: 
17 53 76 
Using chain graph for GNC rotation.
Starting rotation solver.
GNC rotation estimation noise bound:0.0252838
GNC rotation estimation noise bound squared:0.000639273
GNC-TLS solver terminated due to cost convergence.
Cost diff: 0
Iterations: 8
Rotation estimation complete.
Starting translation solver.
Translation estimation complete.
[DEBUG] EigenDRef works

[1]    3844075 segmentation fault (core dumped)  python testTeaser.py

I was able to successfully run teaser_python_ply.py with the debug one. Tested only on python 3.10 and 3.11.


I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y
conda activate reg
pip install open3d
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../.. && cd examples/teaser_python_ply 
python teaser_python_ply.py

However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

@zhaoys87
Copy link

Hi, I was able to reproduce the problem by

'''
sudo apt install cmake libeigen3-dev libboost-all-dev
conda create -n reg python=3.10 numpy -y
conda activate reg
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../../.. && python testTeaser.py
'''

File name testTeaser.py

import numpy as np
import teaserpp_python

random data

test1 = np.random.rand(3, 100)
test2 = np.random.rand(3, 100)

solver_params = teaserpp_python.RobustRegistrationSolver.Params()
solver = teaserpp_python.RobustRegistrationSolver(solver_params)
solver.solve(test1, test2)
some testing
I think the reason is because teaser’s input expects a column-major matrix, while Pybind11 defaults to row-major matrix. Newer versions of numpy (>=2.0.0) might has stricter rules for memory layout (although I couldn't find any document), which causes the mismatch between row/column major matrix.

conda create -n reg python=3.10 numpy=1.26 -y
conda activate reg
pip install open3d
git clone https://github.com/MIT-SPARK/TEASER-plusplus.git
cd TEASER-plusplus && mkdir build && cd build
cmake .. -DTEASERPP_PYTHON_VERSION=3.10 && make teaserpp_python -j24
cd python && pip install .
cd ../.. && cd examples/teaser_python_ply
python teaser_python_ply.py
However, if I downgrade numpy to 1.26, everything works fine (while 2.0.0 causes segmentation fault). To maintain compatibility with newer numpy versions, I think we may need to use EigenDRef to handle the inputs properly.

I have modified the version of pybind11 from v2.11.1 to v2.13.1 in the line 8 of "cmake/pybind11.CMakeLists.txt.in", rebuild, reinstall, and finally works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants