-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP geodetic distance search #1086
base: main
Are you sure you want to change the base?
Conversation
Added NearFilter unit test with real lat-long data. The current implementation of NearFilter doesn't return the expected results. See this shared custom map on Google Maps for an illustration of the Oslo test case: https://www.google.com/maps/d/viewer?mid=1WXlEa5nBOSvBej3HSUNhsh_LLahoab8
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@@ -0,0 +1,45 @@ | |||
package org.locationtech.jts.util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use nitrite package name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought this needed to be in the same package as the existing GeometricShapeFactory
in order to access the protected fields, but I see now that I was misremembering my Java inheritance visibility rules! 😄 I did a quick refactor and it works just fine. Pushing an update...
Your changes looks great to me. |
Thanks for the quick feedback Anindya! Since these changes don't actually work yet, due to the bounding box problem, please advise how you'd recommend proceeding. There seem to be some simple options, e.g. we can use the set of My concern here is that I don't want to start pulling threads and turning this into a larger change because I haven't taken the time to understand the philosophy of how you split the work of the query up across these different layers. (Hence why this PR is a Draft.) |
After a quick review, I see one way we can move forward:
If you find a better way don't hesitate to share also. |
Thanks, I'll give it a try and see how the code shapes up! |
Also, how much weight should we be putting on avoiding breaking interface changes? Theoretically, someone out there may have taken a dependency on Nitrite 4.x and created their own implementation class for the We could just mark the existing methods as |
Ofcourse we should keep them backward compatible for next minor version upgrade and create overloaded methods to add |
As I started looking at the four implementations of ConsiderationsConcrete technical considerationsIf we add the More importantly, it is my understanding that this would be redundant data. The geometry data could be both in the main collection/repository map as well in the RTree map. (e.g. I see that both the Intuitive considerations based on application of general software-design principlesI tried to peel the next layer of the onion to figure out how to judge whether this is an appropriate trade-off in context. This led me to (a) look at the implementation of Admittedly, on the one hand, for every other type of Index we have, full data of the field being indexed becomes the content of the index alongside the ids. As such, there's an expectation that those indices would enable a wide range of filter operations because those operations would have access to the "complete" data of the indexed field. On the other hand, it seems that a spatial index is necessarily different by it's nature. The R-Tree is built on the idea that using only the bounding boxes this comes with significant benefits that outweigh its significant limitations. If we extend it, it's not really just an R-Tree anymore. ConclusionOn balance, I think this adds up to strong reason to prefer the approach where we treat each Sorry for the novella-length analysis. 😄 I look forward to hearing what you think. |
Thanks for the detailed analysis. While filtering via bounding box (using the current algorithm) the resulting set of |
Hi any update on this or it has been abandoned? |
Hi Anindya, thanks for checking in. Things got busy around the holidays at home but I'm planning to pick this back up next week. I appreciate your help and patience. |
@anidotnet I've got a working implementation 🎉 but it only solves After reading Cheers, |
While I am revisiting this discussion, I am thinking may be we should put a clear distinction between geo-spatial and spatial data/queries. A user while only dealing with spatial queries (like a surface in a game world) does not need to be concerned about geodesic, where as a user dealing with lat long data does. Let me know your thought on this. |
😄 This is precisely what I was going to suggest, as well. I discovered as I was modifying some of my unit tests that I was mixing up coordinate systems and this seems like the natural solution. EDIT: I missed that you included "data". Yes, that too. |
As for the new commit I pushed, I'd like to clarify that I changed a bunch of names but those are all just placeholders meant to highlight how I'm thinking about it in this intermediate state. The important thing, as I see it, is the work so far has shown me we need to clarify:
I'm sure I've forgotten a couple of minor things, e.g. the fact that there's throw-away work happening to compute the polygon of the circle and it's not really being utilized. We can do some micro-benchmarking later to see if using the Thanks again. I look forward to your thoughts. |
The first and foremost things we have to do is - separate the filter/data for spatial and geo-spatial queries. We can create respective Geo-* filters to separate the concerns. Your idea of Regarding the Regarding the hierarchy, I think we need a complete redesign of spatial index scanning - |
1. Defining the behavior of NearFilter
Ah, but that's going to be difficult because as of right now, there is no definition of what is the correct behavior. Nothing in the documentation or in the existing unit tests tells what the expected behavior is for this scenario. Given the existing test geometry defined in BaseSpatialTest.java, the next nearest point would be So, you tell me. When you wrote it, did you have some idea of what "near" means for non-point geometry or were you just expecting "near" would be used for point-only data-sets? If we increase the search distance in the test case to 42, such that it catches one point of the 2. Clarification about redesign of spatial index scanning
I don't understand what you are suggesting here. In particular, what does "make it inline" mean? Can you spell out what would be different? Are you suggesting that in
|
My initial thought was with point only data, so I went with the mental picture of - "points within a circle", but when I think of it now, Intersects makes much more sense for any kind of geometry. The correct logic should be - if the circle intersects with any portion of the geometry, the value should be selected.
Current spatial filter and its index scanning does not handle multiple spatial filters in and filter. There is a bug in the
I am not planning to change current |
One way to solve the multiple spatial filters scanning or 2 pass scanning is to modify |
So you're suggesting modifying the FindPlan within Can you tell me more about why you would rather do it there than to change anything in |
For general purpose filters, the plan is calculated before the query operation. For spatial filters, the plan is being calculated during the search operation as it progresses. As for modifying the input object, we can return a new type containing NitriteIds and FindPlan for I am trying to avoid larger changes in the code, that's why I am skeptical about changing |
Is that true? I had convinced myself that we already know what the plan will be ahead of time. i.e. as soon as we start optimizing the plan, we already know we can (1) break it down into an initial bounding box index-filter that only has access to How is the plan "being calculated during the search operation"? I don't understand.
Okay, that makes sense. I have some intuition that this can apply more broadly, but it makes sense to keep it isolated until we actually have those other concrete use-cases at hand.
I don't know how elegant any of it would be, but it feels like now is a good time to take another iteration at the code itself rather than keep talking about the possibilities for the code. 😁 I'll try keeping it isolated in |
My bad. It's a typo. It should be - the plan will be calculated, as we are discussing about modifying |
FYI, my work has gotten busy so it might take another 2 weeks before I have more commits to share on this PR. |
Thanks for the update. No worries. |
Why
With some additional work, this branch is intended to fix issue #1079. In its current state, there is still an unresolved issue where both
NearFilter
andWithinFilter
actually just checking the minimal bounding box and never test the actual geometry.My intended next step is to discuss the best path forward. I have left detailed notes in the form of a comment in the code with my understanding of options for expanding the fix.
What
net.sf.geographiclib : GeographicLib-Java : 2.0
innitrite-spatial/pom.xml
WithinFilter
once I stepped through in the debugger and saw thatSpatialIndex.java
is mistakenly treating the filter work as done after the initial RTree check. Sure enough, the same problem occurs forWithinFilter
. I added an ASCII diagram in a comment within the new unit test,testWithinTriangleNotJustTestingBoundingBox
.