Skip to content

Optimization idea for subsetArray #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sjakobi opened this issue Jul 30, 2020 · 6 comments
Open

Optimization idea for subsetArray #291

sjakobi opened this issue Jul 30, 2020 · 6 comments

Comments

@sjakobi
Copy link
Member

sjakobi commented Jul 30, 2020

-- | /O(n*m)/ Check if the first array is a subset of the second array.
subsetArray :: Eq k => (v1 -> v2 -> Bool) -> A.Array (Leaf k v1) -> A.Array (Leaf k v2) -> Bool
subsetArray cmpV ary1 ary2 = A.length ary1 <= A.length ary2 && A.all inAry2 ary1
where
inAry2 (L k1 v1) = lookupInArrayCont (\_ -> False) (\v2 _ -> cmpV v1 v2) k1 ary2
{-# INLINE inAry2 #-}

Once we've identified an element of ary2 to correspond to an element of ary1, it would be nice if we wouldn't check the same element again against the subsequent elements of ary1.

@treeowl had a neat idea for that in #282 (comment):

I wouldn't record indices. But we could thaw one of the arrays and play mutation games. Say we have

1 2 3 4

and we search for 3. Then we can mutate the array to

undefined 2 1 4

That is, swap the 3 to the front and replace it with undefined. Then start the next scan in the second slot.

@svenkeidel
Copy link
Contributor

We would not even need to write undefined into the first slot. We only need to overwrite 3 with 1 and leave the first slot unchanged:

1 2 1 4

@svenkeidel
Copy link
Contributor

Before we optimize this function, we should really setup a benchmark first to see if we are making progress.

@sjakobi
Copy link
Member Author

sjakobi commented Jul 30, 2020

We would not even need to write undefined into the first slot.

I think the undefined would mostly help us debug the skipping of the first elements. Once it's correct we could probably remove it.

Or would it also help with GC!? 🤔

@svenkeidel
Copy link
Contributor

svenkeidel commented Jul 30, 2020

I don't think the elements of the array can be garbage collected, since the hashmaps in which they are included are still reachable. It would be weird if a "subset" function would garbage collect something.

@svenkeidel
Copy link
Contributor

svenkeidel commented Jul 30, 2020

Yes, but overwriting with undefined may be useful for debugging.

@sjakobi
Copy link
Member Author

sjakobi commented Apr 15, 2022

#406 has implemented a version of this idea for intersection[With[Key]]:

intersectionCollisions :: Eq k => (k -> v1 -> v2 -> (# v3 #)) -> Hash -> Hash -> A.Array (Leaf k v1) -> A.Array (Leaf k v2) -> HashMap k v3
intersectionCollisions f h1 h2 ary1 ary2
| h1 == h2 = runST $ do
mary2 <- A.thaw ary2 0 $ A.length ary2
mary <- A.new_ $ min (A.length ary1) (A.length ary2)
let go i j
| i >= A.length ary1 || j >= A.lengthM mary2 = pure j
| otherwise = do
L k1 v1 <- A.indexM ary1 i
searchSwap k1 j mary2 >>= \case
Just (L _k2 v2) -> do
let !(# v3 #) = f k1 v1 v2
A.write mary j $ L k1 v3
go (i + 1) (j + 1)
Nothing -> do
go (i + 1) j
len <- go 0 0
case len of
0 -> pure Empty
1 -> Leaf h1 <$> A.read mary 0
_ -> Collision h1 <$> (A.unsafeFreeze =<< A.shrink mary len)
| otherwise = Empty
{-# INLINE intersectionCollisions #-}
-- | Say we have
-- @
-- 1 2 3 4
-- @
-- and we search for @3@. Then we can mutate the array to
-- @
-- undefined 2 1 4
-- @
-- We don't actually need to write undefined, we just have to make sure that the next search starts 1 after the current one.
searchSwap :: Eq k => k -> Int -> A.MArray s (Leaf k v) -> ST s (Maybe (Leaf k v))
searchSwap toFind start = go start toFind start
where
go i0 k i mary
| i >= A.lengthM mary = pure Nothing
| otherwise = do
l@(L k' _v) <- A.read mary i
if k == k'
then do
A.write mary i =<< A.read mary i0
pure $ Just l
else go i0 k (i + 1) mary
{-# INLINE searchSwap #-}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants