Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-d tree speedup (nanoflann / CUDA) #5299

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yasamoka
Copy link
Contributor

@yasamoka yasamoka commented Jun 17, 2022

This pull request provides tested kd-tree implementations using Nanoflann (CPU) and FLANN (CUDA) as well as the addition of the ability to set the max leaf size for any kd-tree implementation.

Benchmarks comparing FLANN (CPU), Nanoflann (CPU), and FLANN (CUDA) can be found here: https://yasamoka.github.io/pcl-knn-benchmark/

I am not sure if there is a better way of modifying CMake scripts to satisfy dependencies. If there is, then I would appreciate help with that.

Regarding documentation, I placed the FLANN CUDA implementation with the kdtree module. This has good visibility for users of k-d trees. Shall I move it to its own module (e.g. cuda/kdtree)? Is it possible to have 2 levels like that?

Thank you very much!

@themightyoarfish
Copy link
Contributor

image

This plot seems to show that nanoflann is slower than FLANN, but your bar graphs further down show the opposite 🤔

@yasamoka
Copy link
Contributor Author

image

This plot seems to show that nanoflann is slower than FLANN, but your bar graphs further down show the opposite thinking

The line graph you're seeing is tree build time.

The bar graphs you see below that are NN search time.

Yes, nanoflann is slower than FLANN in tree building for the same leaf size. It is faster than FLANN for NN search the more you head towards less threads / less # search points.

@themightyoarfish
Copy link
Contributor

This seems useful, might a maintainer take a look?

@themightyoarfish
Copy link
Contributor

@mvieth @larshg Could you give some feedback here?

@xiaodong2077
Copy link

xiaodong2077 commented Nov 3, 2023

how about range search? is nanoflann quicker than flann?

@mvieth
Copy link
Member

mvieth commented Dec 28, 2024

Hi, sorry for not replying earlier! I am not sure if you, @yasamoka , are still interested in working on this pull request and getting it merged (I would understand if not). Either way, here are my thoughts and questions:

  • where do you see the advantage of adding both nanoflann and CUDA-based FLANN? Is it simply because CUDA-based FLANN is fastest, but CUDA/a GPU is not always available, so nanoflann is the second fastest?
  • this pull request is very large, it would be better if it was split into several pull requests (e.g. one for CUDA-based FLANN, one for nanoflann, one for the max-leaf-size change). Then it would be easier to review and get it into a state where it is ready to be merged
  • there are now unfortunately a few (merge) conflicts that need to be resolved
  • why did you choose to add KdTreeBase? KdTree already is the (abstract) base class, KdTreeFLANN the subclass of that.
  • more importantly, in which situations do you wish to use the faster search methods? I am asking because many, if not most, PCL classes expect the user-configurable search method to be a subclass of pcl::search::Search. So adding the two methods in kdtree and cuda/kdtree is not really enough, or am I missing something? Maybe it makes more sense to add them directly in the search module, similar to FlannSearch?
  • it is great that you already wrote some tests, however they are currently not built or run on our CI checks because neither nanoflann nor the CUDA part of FLANN are installed. The dockerfile for that is in .dev/docker/env. I saw that nanoflann is available from apt for most Ubuntu releases, but for the CUDA-FLANN it is probably necessary to build FLANN from source? We actually need a separate pull request to first update the docker images, merge that one, only then will nanoflann and CUDA-FLANN be available for the tests. It might also make sense to add the new search methods to test/search/test_search.cpp
  • it is important to keep the new parts optional (that is, PCL is still built if nanoflann/CUDA-FLANN is not available), and that the behavior of the existing PCL classes does not change. But it looks like you made sure of this.
  • from what I have tested so far, nanoflann seems to be faster than (CPU-)FLANN for knn-search (especially for small neighborhoods, with almost no difference for larger k), but for radius search, nanoflann appears to be much slower than (CPU-)FLANN, Do you perhaps know why that is? EDIT: this was only because I left the result-sorting for nanoflann on. If I turn that off, nanoflann is also faster than (CPU-)FLANN for radius search.

I will have more comments once I start reviewing in detail, but these are the most important high-level things for now.

@mvieth mvieth added changelog: new feature Meta-information for changelog generation module: search module: kdtree labels Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: new feature Meta-information for changelog generation module: kdtree module: search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants