Optimize `permutations` with low effort (no change in algorithm) #184

FedericoStra · 2025-05-05T14:50:17Z

Code changes

The modifications to the code stem from the following observations.

Observations

The construct previously used in the for loop is inefficient.
The loop can exit early as soon as state[i] <= max is satisfied.
All array indexing can be marked @inbounds.

Fixing these leads to an overall 2−6x speed-up of iterating through all permutations (see benchmarks).

Remarks

Observation (1) can be addressed alternatively by rewriting the for loop as

for i in lastindex(state) : -1 : firstindex(state)+1

Observation (2) can be addressed alternatively by replacing the old if statement with

if state[i] > max
    state[i] = min
    state[i-1] += 1
else
    break
end

Since I want to address both (1) and (2), I believe the code becomes simpler if we replace the for loop with a while loop.

Simple benchmark

Benchmark code

using BenchmarkTools
using Combinatorics

count_permutations(a) = count(Returns(true), permutations(a))
count_permutations(a, t) = count(Returns(true), permutations(a, t))

# compile
count_permutations(1:3)
for t in 0:3
    count_permutations(1:3, t)
end

for n in [3, 5, 7, 8, 9]
    println("\nn = $(n)\n")
    a = collect(1:n)
    display(@benchmark count_permutations($a))
end

println("\nn = 10, t = 6\n")
display(@benchmark count_permutations(1:10, 6))

Before

n = 3

BenchmarkTools.Trial: 10000 samples with 25 evaluations per sample.
 Range (min … max):  937.720 ns … 86.307 μs  ┊ GC (min … max): 0.00% … 97.60%
 Time  (median):     959.080 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.011 μs ±  1.448 μs  ┊ GC (mean ± σ):  2.78% ±  1.95%

    ▃█▅
  ▂▄███▇▅▄▃▃▃▃▃▃▃▄▄▄▄▄▃▃▂▂▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  938 ns          Histogram: frequency by time         1.21 μs <

 Memory estimate: 624 bytes, allocs estimate: 16.

n = 5

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  104.025 μs …  2.552 ms  ┊ GC (min … max): 0.00% … 93.92%
 Time  (median):     108.570 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.988 μs ± 32.567 μs  ┊ GC (mean ± σ):  0.39% ±  1.33%

    ▆█
  ▁▁██▂▂▅▃▃▆▄▅▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  104 μs          Histogram: frequency by time          152 μs <

 Memory estimate: 11.41 KiB, allocs estimate: 244.

n = 7

BenchmarkTools.Trial: 174 samples with 1 evaluation per sample.
 Range (min … max):  27.823 ms …  35.485 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     28.619 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.794 ms ± 785.805 μs  ┊ GC (mean ± σ):  0.06% ± 0.48%

       ▃  ▅ ▂▂▅ ▄▃█▄▃▃▃ ▂
  ▅▁▅▆▆█▃▇█████▇███████▅███▇▃█▁▅▃▅▁▅▁▃▁▅▁▃▅▃▃▅▅▁▅▁▁▃▁▅▁▁▃▃▁▁▃▃ ▃
  27.8 ms         Histogram: frequency by time         30.7 ms <

 Memory estimate: 551.42 KiB, allocs estimate: 10084.

n = 8

BenchmarkTools.Trial: 9 samples with 1 evaluation per sample.
 Range (min … max):  591.219 ms … 604.105 ms  ┊ GC (min … max): 0.00% … 0.23%
 Time  (median):     597.961 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   597.176 ms ±   4.669 ms  ┊ GC (mean ± σ):  0.03% ± 0.08%

  ▁▁     ▁                 ▁     █     ▁                  ▁   ▁
  ██▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
  591 ms           Histogram: frequency by time          604 ms <

 Memory estimate: 4.92 MiB, allocs estimate: 80644.

n = 9

BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 13.998 s (0.01% GC) to evaluate,
 with a memory estimate of 44.30 MiB, over 725764 allocations.

n = 10, t = 6

BenchmarkTools.Trial: 124 samples with 1 evaluation per sample.
 Range (min … max):  38.167 ms … 48.003 ms  ┊ GC (min … max): 0.00% … 3.20%
 Time  (median):     40.459 ms              ┊ GC (median):    3.04%
 Time  (mean ± σ):   40.607 ms ±  1.281 ms  ┊ GC (mean ± σ):  2.57% ± 1.02%

             ▃██▇▇▇██
  ▅▁▃▃▁▃▆▅▆██████████▆▁▃▇▅▅▃▃▁▁▃▁▃▁▁▁▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  38.2 ms         Histogram: frequency by time        47.2 ms <

 Memory estimate: 16.15 MiB, allocs estimate: 302403.

After

n = 3

BenchmarkTools.Trial: 10000 samples with 301 evaluations per sample.
 Range (min … max):  273.877 ns …   6.984 μs  ┊ GC (min … max): 0.00% … 93.93%
 Time  (median):     287.508 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   320.842 ns ± 344.896 ns  ┊ GC (mean ± σ):  8.34% ±  7.30%

   ▂██▁
  ▃████▆▅▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂ ▃
  274 ns           Histogram: frequency by time          537 ns <

 Memory estimate: 624 bytes, allocs estimate: 16.

n = 5

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  23.609 μs …  2.903 ms  ┊ GC (min … max): 0.00% … 98.30%
 Time  (median):     24.390 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.311 μs ± 36.122 μs  ┊ GC (mean ± σ):  1.97% ±  1.39%

  ▄▇█▇▆▅▂      ▁▁                                             ▂
  ███████▆▆▅▄▅████▇▇▇▅▅▄▄▁▃▃▃▁▁▁▃▃▃▄▃▁▁▃▁▁▁▁▁▁▃▁▁▁▃▁▃▃▁▁▁▁▃▄▅ █
  23.6 μs      Histogram: log(frequency) by time      40.3 μs <

 Memory estimate: 11.41 KiB, allocs estimate: 244.

n = 7

BenchmarkTools.Trial: 868 samples with 1 evaluation per sample.
 Range (min … max):  5.506 ms …   8.911 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.669 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.759 ms ± 358.240 μs  ┊ GC (mean ± σ):  0.62% ± 2.69%

  ▇█▅▃▃▇▆▂▂▃▄▆ ▁
  ███████████████▇█▆▇▆▆▆▆▆▆▇▅▆▅▄▅▅▆▅▇▄▆▅▅▆▆▆▅▄▅▄▁▄▄▁▁▄▄▄▁▄▁▁▅ █
  5.51 ms      Histogram: log(frequency) by time       7.2 ms <

 Memory estimate: 551.42 KiB, allocs estimate: 10084.

n = 8

BenchmarkTools.Trial: 48 samples with 1 evaluation per sample.
 Range (min … max):  104.423 ms … 113.496 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     105.693 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   106.403 ms ±   2.000 ms  ┊ GC (mean ± σ):  0.26% ± 0.49%

     ▁▆█▃       ▃
  ▄▇▄████▇▁▇▇▁▇▇█▇▁▁▁▄▁▁▇▄▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▄ ▁
  104 ms           Histogram: frequency by time          113 ms <

 Memory estimate: 4.92 MiB, allocs estimate: 80644.

n = 9

BenchmarkTools.Trial: 3 samples with 1 evaluation per sample.
 Range (min … max):  2.417 s …  2.434 s  ┊ GC (min … max): 0.15% … 0.07%
 Time  (median):     2.426 s             ┊ GC (median):    0.07%
 Time  (mean ± σ):   2.426 s ± 8.379 ms  ┊ GC (mean ± σ):  0.09% ± 0.05%

  █                             █                        █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.42 s        Histogram: frequency by time        2.43 s <

 Memory estimate: 44.30 MiB, allocs estimate: 725764.

n = 10, t = 6

BenchmarkTools.Trial: 359 samples with 1 evaluation per sample.
 Range (min … max):  12.538 ms …  19.272 ms  ┊ GC (min … max): 0.00% … 10.51%
 Time  (median):     13.739 ms               ┊ GC (median):    6.39%
 Time  (mean ± σ):   13.928 ms ± 751.616 μs  ┊ GC (mean ± σ):  7.78% ±  3.03%

           ▁▆██▇▅  ▁  ▁▃▄▁  ▁
  ▂▂▃▄▂▁▂▁▆██████▇▇█▅▆████▆▆█▆▃▂▃▃▂▂▁▂▂▂▂▁▁▃▁▁▁▂▂▁▁▁▁▂▁▁▂▁▁▁▁▂ ▃
  12.5 ms         Histogram: frequency by time         17.1 ms <

 Memory estimate: 16.15 MiB, allocs estimate: 302403.

Discussion

This optimization is a very low hanging fruit and I believe it can be merged without much thought as a short-term nicety to have. See #186 instead for a more substantial improvement, obtained by rewriting permutations in terms of multiset_permutations. A broader discussion of the efficiency of permutations is in #185.

- The construct previously used in the `for` loop is inefficient. - The loop can exit early as soon as `state[i] <= max` is satisfied. - All array indexing can be marked `@inbounds`.

codecov · 2025-05-05T14:52:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.21%. Comparing base (ab33a23) to head (19b6040).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #184   +/-   ##
=======================================
  Coverage   97.21%   97.21%           
=======================================
  Files           8        8           
  Lines         826      827    +1     
=======================================
+ Hits          803      804    +1     
  Misses         23       23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

FedericoStra added 2 commits May 5, 2025 16:27

perf: optimize permutations (increment! loop)

370d452

- The construct previously used in the `for` loop is inefficient. - The loop can exit early as soon as `state[i] <= max` is satisfied. - All array indexing can be marked `@inbounds`.

style: trailing white spaces

19b6040

FedericoStra mentioned this pull request May 6, 2025

permutations is much slower than it needs to be #185

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `permutations` with low effort (no change in algorithm) #184

Optimize `permutations` with low effort (no change in algorithm) #184

FedericoStra commented May 5, 2025 •

edited

Loading

codecov bot commented May 5, 2025

Optimize permutations with low effort (no change in algorithm) #184

Are you sure you want to change the base?

Optimize permutations with low effort (no change in algorithm) #184

Conversation

FedericoStra commented May 5, 2025 • edited Loading

Code changes

Remarks

Simple benchmark

Discussion

codecov bot commented May 5, 2025

Codecov Report

Optimize `permutations` with low effort (no change in algorithm) #184

Optimize `permutations` with low effort (no change in algorithm) #184

FedericoStra commented May 5, 2025 •

edited

Loading