-
Notifications
You must be signed in to change notification settings - Fork 100
Make intersections much faster #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make intersections much faster #406
Conversation
Benchmarks show that this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few concerns, most importantly about inlining.
Is stylish Haskell happening on GitHub or something? If so, I guess @sjakobi wants that. Don't ask me; I don't know tools. The potential performance concern with an unboxed tuple result is that when the passed function doesn't inline, we ended up with an extra function call per element. I don't know if that makes a big enough difference to worry about; it'd be worth experimenting. I imagine it would save a lot of code duplication if it turns out okay. Specifically, it would avoid source duplication between lazy and strict versions, and object code duplication among the variants. |
How is that different than with a function that does not return an unboxed tuple? Shouldn't it still have to make an extra function call for each element if the function doesn't inline? |
Oh, I was thinking of not inlining the version that produces an unboxed tuple. If we do inline that into the other variants, then yeah, everything should be basically the same but with less source code. But it's worth finding out how much we pay for not inlining it, because object code isn't free either. |
What if we just inline the ones with the unboxed tuples. Is there any disadvantage? Because we already |
This one inlines the unboxed form into everything else, hopefully.
I opened a PR against your branch to do that. |
There are CI failures on older GHC, because shrinking wasn't available yet. We'll need a fall-back. |
The fallback should probably just define the shrinking operation manually in the |
Unboxedness
Should I just use something like |
I dunno. I haven't actually looked at how you're using |
Oh, I just looked. You'll want to use |
Something like #if __GLASGOW_HASKELL__ >= 8.10.7
shrink = ...
#else
shrink = ...
#endif (never used cpp before) |
Yup! It's one of the world's worst macro systems, but it's a lot faster than Template Haskell. Sigh |
Co-authored-by: Simon Jakobi <[email protected]>
Could you please rebase, so the changes from #407 are removed from the diff? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed intersectionCollisions
yet. Could you add some documentation on searchSwap
first?
Data/HashMap/Internal.hs
Outdated
-- iterate over nonzero bits of b1 .&. b2 | ||
let go !i !i1 !i2 !b !bFinal | ||
| b == 0 = pure (i, bFinal) | ||
| testBit $ b1 .&. b2 = do | ||
x1 <- A.indexM ary1 i1 | ||
x2 <- A.indexM ary2 i2 | ||
case f x1 x2 of | ||
Empty -> go i (i1 + 1) (i2 + 1) b' (bFinal .&. complement m) | ||
_ -> do | ||
A.write mary i $! f x1 x2 | ||
go (i + 1) (i1 + 1) (i2 + 1) b' bFinal | ||
| testBit b1 = go i (i1 + 1) i2 b' bFinal | ||
| otherwise = go i i1 (i2 + 1) b' bFinal | ||
where | ||
m = 1 `unsafeShiftL` countTrailingZeros b | ||
testBit x = x .&. m /= 0 | ||
b' = b .&. complement m | ||
(maryLen, bFinal) <- go 0 0 0 bCombined bIntersect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment at the top seems incorrect: Currently the loop actually iterates over b1 .|. b2
. It would be nice to change this though. In that case the i1
and i2
indices could be computed with sparseIndex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the i1 and i2 indices could be computed with sparseIndex.
For comparison, here is a version of unionArrayBy
that uses sparseIndex
to compute all the indices:
unordered-containers/Data/HashMap/Internal.hs
Lines 1626 to 1647 in a780a8d
unionArrayBy f !b1 !b2 !ary1 !ary2 = A.run $ do | |
let b' = b1 .|. b2 | |
mary <- A.new_ (popCount b') | |
-- iterate over nonzero bits of b1 .|. b2 | |
let go !b | |
| b == 0 = return () | |
| otherwise = do | |
let ba = b1 .&. b2 | |
c = countTrailingZeros b | |
m = bit c | |
i = sparseIndex b' m | |
i1 = sparseIndex b1 m | |
i2 = sparseIndex b2 m | |
t <- if | testBit ba c -> do | |
x1 <- A.indexM ary1 i1 | |
x2 <- A.indexM ary2 i2 | |
return $! f x1 x2 | |
| testBit b1 c -> A.indexM ary1 i1 | |
| otherwise -> A.indexM ary2 i2 | |
A.write mary i t | |
go (clearBit b c) | |
go b' |
I expect that keeping i
as a loop argument will be more efficient than recomputing it on each iteration though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sparseIndex
makes benchmarks slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me the diff?
Using sparseIndex makes benchmarks slower
By how much? I also think the benchmark data might be a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I show you the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sparse index:
All
HashMap
intersection
Int: OK (0.34s)
52.1 μs ± 3.3 μs
ByteString: OK (0.26s)
62.6 μs ± 6.1 μs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without sparse index:
All
HashMap
intersection
Int: OK (0.86s)
42.1 μs ± 1.5 μs
ByteString: OK (0.33s)
47.7 μs ± 2.8 μs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks! I think we might get different results with data where there's less overlap between the two maps. But that can be investigated at a different time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up issue in #416.
Data/HashMap/Internal.hs
Outdated
|
||
intersectionCollisions :: Eq k => (k -> v1 -> v2 -> (# v3 #)) -> A.Array (Leaf k v1) -> A.Array (Leaf k v2) -> ST s (Int, A.MArray s (Leaf k v3)) | ||
intersectionCollisions f ary1 ary2 = do | ||
mary2 <- A.thaw ary2 0 $ A.length ary2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we actually need to allocate two arrays for this. The alternative would be to perform the search-and-swap operations on the output array itself.
It might be a bit tricky though – maybe leave it for a follow-up PR, so this one doesn't get too huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue with this is that the type could change. For example if we have two arrays with the numbers as keys, and the arrays are both different types
1 2 3 4
3 4 2 1
Let's thaw the first array, and mutate it to
(f 3 3) 2 1 4
f 3 3
could change the type to be something difference than the 2 1 4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, good point. Unsafe coercions might work for this, but I'd prefer not trying this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only an issue for intersectionWithKey
and such; intersection
itself has no type issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another thing about intersection
that's special: we can reuse the leaves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be better if intersection
had custom code for handling collisions. Maybe this can be achieved by changing intersectionWithKey#
to something similar to filterMapAux
.
I'd slightly prefer if we'd leave this for a follow-up PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have recorded these ideas in #415.
Co-authored-by: Simon Jakobi <[email protected]>
What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, |
Co-authored-by: Simon Jakobi <[email protected]>
Also if we pass the function around through recursion, then we wouldn't be able to implement |
I think we should stick with
I think the docs of these functions are the wrong place to teach people about |
@sjakobi Is there anything else that I need to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the merge conflict, @oberblastmeister?
I'll merge afterwards.
import Data.Hashable (Hashable) | ||
import Data.Hashable.Lifted (Hashable1, Hashable2) | ||
import Data.HashMap.Internal.List (isPermutationBy, unorderedCompare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the changed sorting of imports is probably due to haskell/stylish-haskell#385, which was recently released.
The concerns seem to have been addressed.
Thank you, @oberblastmeister! :) |
Resolves #225. I also incorporated techniques from #291 and #362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.