Splitting the benchmark `Env` #293

sjakobi · 2020-07-30T13:32:18Z

In #282 (comment), @svenkeidel mentioned some concerns about the Env we currently use in the benchmarks:

unordered-containers/benchmarks/Benchmarks.hs

Lines 34 to 122 in 352591a

    
           -- TODO: This a stopgap measure to keep the benchmark work with 
        
           -- Criterion 1.0. 
        
           data Env = Env { 
        
               n :: !Int, 
        
               elems   :: ![(String, Int)], 
        
               keys    :: ![String], 
        
               elemsBS :: ![(BS.ByteString, Int)], 
        
               keysBS  :: ![BS.ByteString], 
        
               elemsI  :: ![(Int, Int)], 
        
               keysI   :: ![Int], 
        
               elemsI2 :: ![(Int, Int)],  -- for union 
        
               keys'    :: ![String], 
        
               keysBS'  :: ![BS.ByteString], 
        
               keysI'   :: ![Int], 
        
               keysDup    :: ![String], 
        
               keysDupBS  :: ![BS.ByteString], 
        
               keysDupI   :: ![Int], 
        
               elemsDup   :: ![(String, Int)], 
        
               elemsDupBS :: ![(BS.ByteString, Int)], 
        
               elemsDupI  :: ![(Int, Int)], 
        
               hm          :: !(HM.HashMap String Int), 
        
               hmSubset    :: !(HM.HashMap String Int), 
        
               hmbs        :: !(HM.HashMap BS.ByteString Int), 
        
               hmbsSubset  :: !(HM.HashMap BS.ByteString Int), 
        
               hmi         :: !(HM.HashMap Int Int), 
        
               hmiSubset   :: !(HM.HashMap Int Int), 
        
               hmi2        :: !(HM.HashMap Int Int), 
        
               m           :: !(M.Map String Int), 
        
               mSubset     :: !(M.Map String Int), 
        
               mbs         :: !(M.Map BS.ByteString Int), 
        
               mbsSubset   :: !(M.Map BS.ByteString Int), 
        
               im          :: !(IM.IntMap Int), 
        
               imSubset    :: !(IM.IntMap Int), 
        
               ihm         :: !(IHM.Map String Int), 
        
               ihmSubset   :: !(IHM.Map String Int), 
        
               ihmbs       :: !(IHM.Map BS.ByteString Int), 
        
               ihmbsSubset :: !(IHM.Map BS.ByteString Int) 
        
               } deriving (Generic, NFData) 
        
           setupEnv :: IO Env 
        
           setupEnv = do 
        
               let n = 2^(12 :: Int) 
        
                   elems   = zip keys [1..n] 
        
                   keys    = US.rnd 8 n 
        
                   elemsBS = zip keysBS [1..n] 
        
                   keysBS  = UBS.rnd 8 n 
        
                   elemsI  = zip keysI [1..n] 
        
                   keysI   = UI.rnd (n+n) n 
        
                   elemsI2 = zip [n `div` 2..n + (n `div` 2)] [1..n]  -- for union 
        
                   keys'    = US.rnd' 8 n 
        
                   keysBS'  = UBS.rnd' 8 n 
        
                   keysI'   = UI.rnd' (n+n) n 
        
                   keysDup    = US.rnd 2 n 
        
                   keysDupBS  = UBS.rnd 2 n 
        
                   keysDupI   = UI.rnd (n`div`4) n 
        
                   elemsDup   = zip keysDup [1..n] 
        
                   elemsDupBS = zip keysDupBS [1..n] 
        
                   elemsDupI  = zip keysDupI [1..n] 
        
                   hm          = HM.fromList elems 
        
                   hmSubset    = HM.fromList (takeSubset n elems) 
        
                   hmbs        = HM.fromList elemsBS 
        
                   hmbsSubset  = HM.fromList (takeSubset n elemsBS) 
        
                   hmi         = HM.fromList elemsI 
        
                   hmiSubset   = HM.fromList (takeSubset n elemsI) 
        
                   hmi2        = HM.fromList elemsI2 
        
                   m           = M.fromList elems 
        
                   mSubset     = M.fromList (takeSubset n elems) 
        
                   mbs         = M.fromList elemsBS 
        
                   mbsSubset   = M.fromList (takeSubset n elemsBS) 
        
                   im          = IM.fromList elemsI 
        
                   imSubset    = IM.fromList (takeSubset n elemsI) 
        
                   ihm         = IHM.fromList elems 
        
                   ihmSubset   = IHM.fromList (takeSubset n elems) 
        
                   ihmbs       = IHM.fromList elemsBS 
        
                   ihmbsSubset = IHM.fromList (takeSubset n elemsBS) 
        
               return Env{..} 
        
             where 
        
               takeSubset n elements = 
        
                 -- use 50% of the elements for a subset check. 
        
                 let subsetSize = round (fromIntegral n * 0.5 :: Double) :: Int 
        
                 in take subsetSize elements

I think it is a bad idea to share a single benchmark environment across all benchmarks. It is reallocated for every benchmark group. We should split the environment into smaller environments for each benchmark group to reduce memory pressure.

The text was updated successfully, but these errors were encountered:

sjakobi · 2020-07-30T13:39:40Z

The criterion docs also contain some related advice:

Discussion. The environment created in the example above is intentionally not ideal. As Haskell's scoping rules suggest, the variable big is in scope for the benchmarks that use only small. It would be better to create a separate environment for big, so that it will not be kept alive while the unrelated benchmarks are being run.

IIUC the main issue with large Envs is that they require more frequent garbage collections, thereby adding noise to the timings.

@svenkeidel Are your initial refactorings of the benchmark suite still available on some branch, BTW?

svenkeidel · 2020-07-30T14:30:38Z

https://github.com/svenkeidel/unordered-containers/tree/refactor-benchmarks

sjakobi added the benchmarks label Jul 30, 2020

sjakobi mentioned this issue Nov 29, 2021

Some benchmarks are very noisy #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting the benchmark `Env` #293

Splitting the benchmark `Env` #293

sjakobi commented Jul 30, 2020

sjakobi commented Jul 30, 2020

svenkeidel commented Jul 30, 2020

Splitting the benchmark Env #293

Splitting the benchmark Env #293

Comments

sjakobi commented Jul 30, 2020

sjakobi commented Jul 30, 2020

svenkeidel commented Jul 30, 2020

Splitting the benchmark `Env` #293

Splitting the benchmark `Env` #293