Make intersections much faster #406

oberblastmeister · 2022-04-09T15:10:22Z

Resolves #225. I also incorporated techniques from #291 and #362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.

oberblastmeister · 2022-04-09T15:32:34Z

Benchmarks show that this is 2x 3x faster than the previous implementation! (also faster than union now)

Data/HashMap/Internal.hs

treeowl

Just a few concerns, most importantly about inlining.

Data/HashMap/Internal.hs

sjakobi · 2022-04-09T18:20:13Z

Thanks for working on this! I'll review on monday. oberblastmeister ***@***.***> schrieb am Sa., 9. Apr. 2022, 17:10:

…

treeowl · 2022-04-09T19:21:58Z

Is stylish Haskell happening on GitHub or something? If so, I guess @sjakobi wants that. Don't ask me; I don't know tools.

The potential performance concern with an unboxed tuple result is that when the passed function doesn't inline, we ended up with an extra function call per element. I don't know if that makes a big enough difference to worry about; it'd be worth experimenting. I imagine it would save a lot of code duplication if it turns out okay. Specifically, it would avoid source duplication between lazy and strict versions, and object code duplication among the variants.

oberblastmeister · 2022-04-09T20:13:55Z

How is that different than with a function that does not return an unboxed tuple? Shouldn't it still have to make an extra function call for each element if the function doesn't inline?

treeowl · 2022-04-09T20:24:18Z

Oh, I was thinking of not inlining the version that produces an unboxed tuple. If we do inline that into the other variants, then yeah, everything should be basically the same but with less source code. But it's worth finding out how much we pay for not inlining it, because object code isn't free either.

oberblastmeister · 2022-04-09T20:35:57Z

What if we just inline the ones with the unboxed tuples. Is there any disadvantage? Because we already inline intersectionWith, so inline intersectionWith# shouldn't create extra code? Then later we can experiment and see if we don't have to inline the unboxed version.

This one inlines the unboxed form into everything else, hopefully.

treeowl · 2022-04-09T21:04:38Z

I opened a PR against your branch to do that.

treeowl · 2022-04-09T21:05:06Z

There are CI failures on older GHC, because shrinking wasn't available yet. We'll need a fall-back.

treeowl · 2022-04-09T21:07:20Z

The fallback should probably just define the shrinking operation manually in the Array module. It won't be too efficient, but whatever.

Unboxedness

oberblastmeister · 2022-04-09T21:26:17Z

Should I just use something like copy to implement the fallback?

treeowl · 2022-04-09T21:30:46Z

I dunno. I haven't actually looked at how you're using shrink. Really, you can do whatever you think is reasonable, but I'd prefer to keep the CPP for it in Array if possible.

treeowl · 2022-04-09T21:33:28Z

Oh, I just looked. You'll want to use cloneSmallMutableArray#

oberblastmeister · 2022-04-09T21:40:45Z

Something like

#if __GLASGOW_HASKELL__ >= 8.10.7
shrink = ...
#else
shrink = ...
#endif

(never used cpp before)

treeowl · 2022-04-09T21:43:10Z

Yup! It's one of the world's worst macro systems, but it's a lot faster than Template Haskell. Sigh

Data/HashMap/Internal/Array.hs

Co-authored-by: Simon Jakobi <[email protected]>

sjakobi · 2022-04-12T10:20:06Z

Could you please rebase, so the changes from #407 are removed from the diff?

sjakobi

I haven't reviewed intersectionCollisions yet. Could you add some documentation on searchSwap first?

Data/HashMap/Internal/Array.hs

Data/HashMap/Internal/Strict.hs

tests/Properties/HashMapLazy.hs

Data/HashMap/Internal.hs

sjakobi · 2022-04-12T11:08:55Z

Data/HashMap/Internal.hs

+  -- iterate over nonzero bits of b1 .&. b2
+  let go !i !i1 !i2 !b !bFinal
+        | b == 0 = pure (i, bFinal)
+        | testBit $ b1 .&. b2 = do
+          x1 <- A.indexM ary1 i1
+          x2 <- A.indexM ary2 i2
+          case f x1 x2 of
+            Empty -> go i (i1 + 1) (i2 + 1) b' (bFinal .&. complement m)
+            _ -> do
+              A.write mary i $! f x1 x2
+              go (i + 1) (i1 + 1) (i2 + 1) b' bFinal
+        | testBit b1 = go i (i1 + 1) i2 b' bFinal
+        | otherwise = go i i1 (i2 + 1) b' bFinal
+        where
+          m = 1 `unsafeShiftL` countTrailingZeros b
+          testBit x = x .&. m /= 0
+          b' = b .&. complement m
+  (maryLen, bFinal) <- go 0 0 0 bCombined bIntersect


The comment at the top seems incorrect: Currently the loop actually iterates over b1 .|. b2. It would be nice to change this though. In that case the i1 and i2 indices could be computed with sparseIndex.

In that case the i1 and i2 indices could be computed with sparseIndex.

For comparison, here is a version of unionArrayBy that uses sparseIndex to compute all the indices:

unordered-containers/Data/HashMap/Internal.hs

Lines 1626 to 1647 in a780a8d

unionArrayBy f !b1 !b2 !ary1 !ary2 = A.run $ do

let b' = b1 .|. b2

mary <- A.new_ (popCount b')

-- iterate over nonzero bits of b1 .|. b2

let go !b

| b == 0 = return ()

| otherwise = do

let ba = b1 .&. b2

c = countTrailingZeros b

m = bit c

i = sparseIndex b' m

i1 = sparseIndex b1 m

i2 = sparseIndex b2 m

t <- if | testBit ba c -> do

x1 <- A.indexM ary1 i1

x2 <- A.indexM ary2 i2

return $! f x1 x2

| testBit b1 c -> A.indexM ary1 i1

| otherwise -> A.indexM ary2 i2

A.write mary i t

go (clearBit b c)

go b'

I expect that keeping i as a loop argument will be more efficient than recomputing it on each iteration though.

Using sparseIndex makes benchmarks slower

Can you show me the diff?

Using sparseIndex makes benchmarks slower

By how much? I also think the benchmark data might be a bit weird.

How do I show you the diff?

Ideally both.

Sparse index:

All HashMap intersection Int: OK (0.34s) 52.1 μs ± 3.3 μs ByteString: OK (0.26s) 62.6 μs ± 6.1 μs

Without sparse index:

All HashMap intersection Int: OK (0.86s) 42.1 μs ± 1.5 μs ByteString: OK (0.33s) 47.7 μs ± 2.8 μs

Alright, thanks! I think we might get different results with data where there's less overlap between the two maps. But that can be investigated at a different time.

Follow-up issue in #416.

Data/HashMap/Internal.hs

sjakobi · 2022-04-12T12:05:23Z

Data/HashMap/Internal.hs

+
+intersectionCollisions :: Eq k => (k -> v1 -> v2 -> (# v3 #)) -> A.Array (Leaf k v1) -> A.Array (Leaf k v2) -> ST s (Int, A.MArray s (Leaf k v3))
+intersectionCollisions f ary1 ary2 = do
+  mary2 <- A.thaw ary2 0 $ A.length ary2


I wonder whether we actually need to allocate two arrays for this. The alternative would be to perform the search-and-swap operations on the output array itself.

It might be a bit tricky though – maybe leave it for a follow-up PR, so this one doesn't get too huge.

I think the issue with this is that the type could change. For example if we have two arrays with the numbers as keys, and the arrays are both different types
1 2 3 4
3 4 2 1
Let's thaw the first array, and mutate it to
(f 3 3) 2 1 4
f 3 3 could change the type to be something difference than the 2 1 4.

Ah, yes, good point. Unsafe coercions might work for this, but I'd prefer not trying this in this PR.

This is only an issue for intersectionWithKey and such; intersection itself has no type issue.

There's another thing about intersection that's special: we can reuse the leaves.

Yes, it would be better if intersection had custom code for handling collisions. Maybe this can be achieved by changing intersectionWithKey# to something similar to filterMapAux.

I'd slightly prefer if we'd leave this for a follow-up PR though.

Here is the branch https://github.com/oberblastmeister/unordered-containers/tree/fast-intersection-sparseIndex

I have recorded these ideas in #415.

Co-authored-by: Simon Jakobi <[email protected]>

oberblastmeister · 2022-04-12T19:04:11Z

What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, intersection has 1,700 terms, while unionWithKey has 2,200! Wouldn't it be bad to mark these as {-# INLINE #-}? We could add a comment saying to explicitly inline if you want to remove the closure.

Co-authored-by: Simon Jakobi <[email protected]>

oberblastmeister · 2022-04-12T19:11:31Z

Also if we pass the function around through recursion, then we wouldn't be able to implement intersection in terms of intersectionWithKey right?

Data/HashMap/Internal.hs

sjakobi · 2022-04-12T22:13:07Z

What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, intersection has 1,700 terms, while unionWithKey has 2,200! Wouldn't it be bad to mark these as {-# INLINE #-}?

I think we should stick with INLINABLE until we're convinced that INLINE is better somehow. INLINABLE is better for compile times for example.

We could add a comment saying to explicitly inline if you want to remove the closure.

I think the docs of these functions are the wrong place to teach people about GHC.Exts.inline. Maybe it should be mentioned in https://github.com/input-output-hk/hs-opt-handbook.github.io?!

Data/HashMap/Internal.hs

oberblastmeister · 2022-04-15T17:01:52Z

@sjakobi Is there anything else that I need to do?

sjakobi

Can you fix the merge conflict, @oberblastmeister?

I'll merge afterwards.

sjakobi · 2022-04-15T18:02:54Z

Data/HashMap/Internal.hs

 import Data.Hashable              (Hashable)
 import Data.Hashable.Lifted       (Hashable1, Hashable2)
+import Data.HashMap.Internal.List (isPermutationBy, unorderedCompare)


FWIW, the changed sorting of imports is probably due to haskell/stylish-haskell#385, which was recently released.

The concerns seem to have been addressed.

sjakobi · 2022-04-15T20:10:22Z

Thank you, @oberblastmeister! :)

fast intersection

21f238b

cleanup

16f1f7f

treeowl reviewed Apr 9, 2022

View reviewed changes

Data/HashMap/Internal.hs Show resolved Hide resolved

add show back

bcc13fc

treeowl previously requested changes Apr 9, 2022

View reviewed changes

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

Data/HashMap/Internal.hs Show resolved Hide resolved

oberblastmeister added 2 commits April 9, 2022 13:45

inline

d5262bf

debug checks

a16456b

inline function

f72011c

oberblastmeister and others added 3 commits April 9, 2022 16:43

refactor to use snoc

678a38c

Try the unboxed result thing

ec24215

This one inlines the unboxed form into everything else, hopefully.

Remove redundant internal constraint

767ae6e

Merge pull request #3 from treeowl/unboxedness

72510b4

Unboxedness

oberblastmeister added 2 commits April 9, 2022 18:04

shrink compat

fd43ba7

fix import

3612645

treeowl reviewed Apr 9, 2022

View reviewed changes

Data/HashMap/Internal/Array.hs Show resolved Hide resolved

use clone

b484042

oberblastmeister and others added 3 commits April 11, 2022 19:25

Update Data/HashSet/Internal.hs

bf9a27f

Co-authored-by: Simon Jakobi <[email protected]>

naming

1c20739

Exts.inline

92e4b2a

sjakobi reviewed Apr 12, 2022

View reviewed changes

add haddocks for searchSwap

5a439cc

sjakobi reviewed Apr 12, 2022

View reviewed changes

oberblastmeister and others added 2 commits April 12, 2022 14:42

cleanup

1c118c4

Update Data/HashMap/Internal/Array.hs

1256cf3

Co-authored-by: Simon Jakobi <[email protected]>

Update Data/HashMap/Internal/Array.hs

b0210c8

Co-authored-by: Simon Jakobi <[email protected]>

treeowl reviewed Apr 12, 2022

View reviewed changes

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

sjakobi mentioned this pull request Apr 12, 2022

Get rid of lookupCont #410

Open

refactor

69f8f28

sjakobi reviewed Apr 13, 2022

View reviewed changes

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

oberblastmeister added 3 commits April 13, 2022 17:20

formatting

06cc511

breakup lines

d9a50d7

use Exts.inline

d24cc1f

sjakobi approved these changes Apr 15, 2022

View reviewed changes

This was referenced Apr 15, 2022

Optimization potential in intersection #415

Open

Better benchmarks for intersection (and better intersectionArraysBy) #416

Open

Optimization idea for subsetArray #291

Open

Make use of shrinkSmallMutableArray# primop #362

Closed

Merge branch 'master' into fast-intersection

64f3f2f

sjakobi merged commit b73381e into haskell-unordered-containers:master Apr 15, 2022

	unionArrayBy f !b1 !b2 !ary1 !ary2 = A.run $ do
	let b' = b1 .\|. b2
	mary <- A.new_ (popCount b')
	-- iterate over nonzero bits of b1 .\|. b2
	let go !b
	\| b == 0 = return ()
	\| otherwise = do
	let ba = b1 .&. b2
	c = countTrailingZeros b
	m = bit c
	i = sparseIndex b' m
	i1 = sparseIndex b1 m
	i2 = sparseIndex b2 m
	t <- if \| testBit ba c -> do
	x1 <- A.indexM ary1 i1
	x2 <- A.indexM ary2 i2
	return $! f x1 x2
	\| testBit b1 c -> A.indexM ary1 i1
	\| otherwise -> A.indexM ary2 i2
	A.write mary i t
	go (clearBit b c)
	go b'

Make intersections much faster #406

Make intersections much faster #406

Uh oh!

Conversation

oberblastmeister commented Apr 9, 2022

Uh oh!

oberblastmeister commented Apr 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

treeowl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sjakobi commented Apr 9, 2022 via email

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

oberblastmeister commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

oberblastmeister commented Apr 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

oberblastmeister commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

oberblastmeister commented Apr 9, 2022

Uh oh!

treeowl commented Apr 9, 2022

Uh oh!

Uh oh!

sjakobi commented Apr 12, 2022

Uh oh!

sjakobi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oberblastmeister commented Apr 9, 2022 •

edited

Loading

oberblastmeister commented Apr 9, 2022 •

edited

Loading