Skip to content

Some benchmarks are very noisy #332

Open
@sjakobi

Description

@sjakobi

Here's a sequence of benchmark runs on the same code (bd165b0) using tasty-bench's --fail-faster and --fail-slower flags to highlight differing results:

$ cabal bench --benchmark-options "--stdev=1 --timeout=10 --csv=bench-0.csv"
<snip>
$ cabal bench --benchmark-options "--stdev=1 --timeout=10 --csv=bench-1.csv --baseline=bench-0.csv --fail-if-slower=5 --fail-if-faster=5 --hide-successes"
All
  Map
    insert
      String:           FAIL (0.95s)
        1.63 ms ±  25 μs,  5% faster than baseline
        Use -p '/All.Map.insert.String/' to rerun this test only.
      ByteStringString: FAIL (0.86s)
        1.45 ms ±  23 μs,  5% faster than baseline
        Use -p '/All.Map.insert.ByteStringString/' to rerun this test only.
    fromList
      ByteString:       FAIL (0.86s)
        1.46 ms ±  28 μs,  5% faster than baseline
        Use -p '/All.Map.fromList.ByteString/' to rerun this test only.
  hashmap/Map
    delete-miss
      ByteString:       FAIL (0.78s)
        650  μs ±  11 μs, 10% faster than baseline
        Use -p '/hashmap\/Map.delete-miss.ByteString/' to rerun this test only.
  IntMap
    lookup-miss:        FAIL (1.54s)
      338  μs ± 2.7 μs,  5% slower than baseline
      Use -p '/IntMap.lookup-miss/' to rerun this test only.
    delete-miss:        FAIL (1.34s)
      582  μs ± 6.1 μs,  9% faster than baseline
      Use -p '/IntMap.delete-miss/' to rerun this test only.
  HashMap
    lookup-miss
      ByteString:       FAIL (1.09s)
        112  μs ± 1.6 μs,  5% slower than baseline
        Use -p '/HashMap.lookup-miss.ByteString/' to rerun this test only.
      Int:              FAIL (1.01s)
        102  μs ± 1.6 μs,  9% slower than baseline
        Use -p '/lookup-miss.Int/' to rerun this test only.
    insert
      ByteString:       FAIL (1.21s)
        523  μs ± 6.3 μs, 19% faster than baseline
        Use -p '/HashMap.insert.ByteString/' to rerun this test only.
      Int:              FAIL (1.11s)
        469  μs ± 5.6 μs, 13% faster than baseline
        Use -p '/insert.Int/' to rerun this test only.
    insert-dup
      Int:              FAIL (0.96s)
        398  μs ± 5.7 μs, 14% faster than baseline
        Use -p '/insert-dup.Int/' to rerun this test only.
    delete
      String:           FAIL (0.90s)
        754  μs ±  11 μs, 12% faster than baseline
        Use -p '/HashMap.delete.String/' to rerun this test only.
    delete-miss
      String:           FAIL (0.97s)
        205  μs ± 3.0 μs,  5% faster than baseline
        Use -p '/HashMap.delete-miss.String/' to rerun this test only.
      ByteString:       FAIL (0.77s)
        149  μs ± 2.6 μs,  7% slower than baseline
        Use -p '/HashMap.delete-miss.ByteString/' to rerun this test only.
      Int:              FAIL (1.33s)
        289  μs ± 2.8 μs,  5% slower than baseline
        Use -p '/delete-miss.Int/' to rerun this test only.
    alterInsert
      ByteString:       FAIL (1.31s)
        580  μs ± 7.2 μs, 18% faster than baseline
        Use -p '/alterInsert.ByteString/' to rerun this test only.
      Int:              FAIL (1.19s)
        505  μs ± 5.9 μs, 21% faster than baseline
        Use -p '/alterInsert.Int/' to rerun this test only.
    alterFInsert
      String:           FAIL (4.86s)
        570  μs ± 1.5 μs, 15% faster than baseline
        Use -p '/alterFInsert.String/' to rerun this test only.
      ByteString:       FAIL (1.21s)
        518  μs ± 5.8 μs, 20% faster than baseline
        Use -p '/alterFInsert.ByteString/' to rerun this test only.
      Int:              FAIL (1.10s)
        465  μs ± 7.9 μs, 22% faster than baseline
        Use -p '/alterFInsert.Int/' to rerun this test only.
    alterFInsert-dup
      Int:              FAIL (0.94s)
        387  μs ± 5.8 μs, 15% faster than baseline
        Use -p '/alterFInsert-dup.Int/' to rerun this test only.
    alterFDelete-miss
      String:           FAIL (0.96s)
        203  μs ± 2.9 μs,  5% faster than baseline
        Use -p '/alterFDelete-miss.String/' to rerun this test only.
      ByteString:       FAIL (0.76s)
        148  μs ± 2.8 μs,  6% slower than baseline
        Use -p '/alterFDelete-miss.ByteString/' to rerun this test only.
    fromListWith
      long
        String:         FAIL (0.92s)
          387  μs ± 7.0 μs,  6% faster than baseline
          Use -p '/fromListWith.long.String/' to rerun this test only.

24 out of 118 tests failed (184.35s)

$ cabal bench --benchmark-options "--stdev=1 --timeout=10 --csv=bench-2.csv --baseline=bench-1.csv --fail-if-slower=5 --fail-if-faster=5 --hide-successes"
All
  hashmap/Map
    delete
      ByteString:       FAIL (0.82s)
        677  μs ±  11 μs,  8% faster than baseline
        Use -p '/hashmap\/Map.delete.ByteString/' to rerun this test only.
  IntMap
    delete:             FAIL (1.16s)
      495  μs ± 5.5 μs,  9% faster than baseline
      Use -p '$0=="All.IntMap.delete"' to rerun this test only.
  HashMap
    delete
      ByteString:       FAIL (1.38s)
        610  μs ± 5.5 μs, 15% faster than baseline
        Use -p '/HashMap.delete.ByteString/' to rerun this test only.
      Int:              FAIL (1.06s)
        444  μs ± 5.7 μs, 12% faster than baseline
        Use -p '/delete.Int/' to rerun this test only.
    delete-miss
      Int:              FAIL (2.30s)
        260  μs ± 1.4 μs, 10% faster than baseline
        Use -p '/delete-miss.Int/' to rerun this test only.
    alterInsert-dup
      Int:              FAIL (1.02s)
        429  μs ± 5.3 μs, 13% faster than baseline
        Use -p '/alterInsert-dup.Int/' to rerun this test only.
    alterDelete
      String:           FAIL (0.91s)
        760  μs ±  11 μs, 11% faster than baseline
        Use -p '/alterDelete.String/' to rerun this test only.
      ByteString:       FAIL (0.79s)
        637  μs ±  12 μs, 13% faster than baseline
        Use -p '/alterDelete.ByteString/' to rerun this test only.
      Int:              FAIL (1.08s)
        453  μs ± 6.0 μs, 11% faster than baseline
        Use -p '/alterDelete.Int/' to rerun this test only.
    alterFDelete
      String:           FAIL (0.88s)
        745  μs ±  11 μs, 12% faster than baseline
        Use -p '/alterFDelete.String/' to rerun this test only.
      ByteString:       FAIL (1.40s)
        609  μs ± 5.4 μs, 15% faster than baseline
        Use -p '/alterFDelete.ByteString/' to rerun this test only.
      Int:              FAIL (1.04s)
        446  μs ± 6.1 μs, 10% faster than baseline
        Use -p '/alterFDelete.Int/' to rerun this test only.
    alterDelete-miss
      Int:              FAIL (1.25s)
        269  μs ± 2.8 μs,  8% faster than baseline
        Use -p '/alterDelete-miss.Int/' to rerun this test only.
    alterFDelete-miss
      Int:              FAIL (1.23s)
        260  μs ± 3.0 μs,  7% faster than baseline
        Use -p '/alterFDelete-miss.Int/' to rerun this test only.

14 out of 118 tests failed (137.69s)

Benchmarks for containers and hashmap were included by uncommenting this line:

-- cpp-options: -DBENCH_containers_Map -DBENCH_containers_IntMap -DBENCH_hashmap_Map

I did try to make my machine pretty quiet for these runs. I don't know why these benchmarks are still so very noisy, but I note that most of these are on the slower end of our benchmark suite.

It also seems noteworthy that hardly any of the containers and hashmap benchmarks are included, apparently more than would be explained by their smaller share of the suite.

Maybe implementing #293 would help?!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions