Skip to content

Optimize permutations with low effort (no change in algorithm) #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

FedericoStra
Copy link
Contributor

@FedericoStra FedericoStra commented May 5, 2025

Code changes

The modifications to the code stem from the following observations.

Observations

  1. The construct previously used in the for loop is inefficient.
  2. The loop can exit early as soon as state[i] <= max is satisfied.
  3. All array indexing can be marked @inbounds.

Fixing these leads to an overall 2−6x speed-up of iterating through all permutations (see benchmarks).

Remarks

Observation (1) can be addressed alternatively by rewriting the for loop as

for i in lastindex(state) : -1 : firstindex(state)+1

Observation (2) can be addressed alternatively by replacing the old if statement with

if state[i] > max
    state[i] = min
    state[i-1] += 1
else
    break
end

Since I want to address both (1) and (2), I believe the code becomes simpler if we replace the for loop with a while loop.

Simple benchmark

Benchmark code
using BenchmarkTools
using Combinatorics

count_permutations(a) = count(Returns(true), permutations(a))
count_permutations(a, t) = count(Returns(true), permutations(a, t))

# compile
count_permutations(1:3)
for t in 0:3
    count_permutations(1:3, t)
end

for n in [3, 5, 7, 8, 9]
    println("\nn = $(n)\n")
    a = collect(1:n)
    display(@benchmark count_permutations($a))
end

println("\nn = 10, t = 6\n")
display(@benchmark count_permutations(1:10, 6))
Before
n = 3

BenchmarkTools.Trial: 10000 samples with 25 evaluations per sample.
 Range (min … max):  937.720 ns … 86.307 μs  ┊ GC (min … max): 0.00% … 97.60%
 Time  (median):     959.080 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.011 μs ±  1.448 μs  ┊ GC (mean ± σ):  2.78% ±  1.95%

    ▃█▅
  ▂▄███▇▅▄▃▃▃▃▃▃▃▄▄▄▄▄▃▃▂▂▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂ ▃
  938 ns          Histogram: frequency by time         1.21 μs <

 Memory estimate: 624 bytes, allocs estimate: 16.

n = 5

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  104.025 μs …  2.552 ms  ┊ GC (min … max): 0.00% … 93.92%
 Time  (median):     108.570 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.988 μs ± 32.567 μs  ┊ GC (mean ± σ):  0.39% ±  1.33%

    ▆█
  ▁▁██▂▂▅▃▃▆▄▅▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  104 μs          Histogram: frequency by time          152 μs <

 Memory estimate: 11.41 KiB, allocs estimate: 244.

n = 7

BenchmarkTools.Trial: 174 samples with 1 evaluation per sample.
 Range (min … max):  27.823 ms …  35.485 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     28.619 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.794 ms ± 785.805 μs  ┊ GC (mean ± σ):  0.06% ± 0.48%

       ▃  ▅ ▂▂▅ ▄▃█▄▃▃▃ ▂
  ▅▁▅▆▆█▃▇█████▇███████▅███▇▃█▁▅▃▅▁▅▁▃▁▅▁▃▅▃▃▅▅▁▅▁▁▃▁▅▁▁▃▃▁▁▃▃ ▃
  27.8 ms         Histogram: frequency by time         30.7 ms <

 Memory estimate: 551.42 KiB, allocs estimate: 10084.

n = 8

BenchmarkTools.Trial: 9 samples with 1 evaluation per sample.
 Range (min … max):  591.219 ms … 604.105 ms  ┊ GC (min … max): 0.00% … 0.23%
 Time  (median):     597.961 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   597.176 ms ±   4.669 ms  ┊ GC (mean ± σ):  0.03% ± 0.08%

  ▁▁     ▁                 ▁     █     ▁                  ▁   ▁
  ██▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
  591 ms           Histogram: frequency by time          604 ms <

 Memory estimate: 4.92 MiB, allocs estimate: 80644.

n = 9

BenchmarkTools.Trial: 1 sample with 1 evaluation per sample.
 Single result which took 13.998 s (0.01% GC) to evaluate,
 with a memory estimate of 44.30 MiB, over 725764 allocations.

n = 10, t = 6

BenchmarkTools.Trial: 124 samples with 1 evaluation per sample.
 Range (min … max):  38.167 ms … 48.003 ms  ┊ GC (min … max): 0.00% … 3.20%
 Time  (median):     40.459 ms              ┊ GC (median):    3.04%
 Time  (mean ± σ):   40.607 ms ±  1.281 ms  ┊ GC (mean ± σ):  2.57% ± 1.02%

             ▃██▇▇▇██
  ▅▁▃▃▁▃▆▅▆██████████▆▁▃▇▅▅▃▃▁▁▃▁▃▁▁▁▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  38.2 ms         Histogram: frequency by time        47.2 ms <

 Memory estimate: 16.15 MiB, allocs estimate: 302403.
After
n = 3

BenchmarkTools.Trial: 10000 samples with 301 evaluations per sample.
 Range (min … max):  273.877 ns …   6.984 μs  ┊ GC (min … max): 0.00% … 93.93%
 Time  (median):     287.508 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   320.842 ns ± 344.896 ns  ┊ GC (mean ± σ):  8.34% ±  7.30%

   ▂██▁
  ▃████▆▅▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂ ▃
  274 ns           Histogram: frequency by time          537 ns <

 Memory estimate: 624 bytes, allocs estimate: 16.

n = 5

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  23.609 μs …  2.903 ms  ┊ GC (min … max): 0.00% … 98.30%
 Time  (median):     24.390 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.311 μs ± 36.122 μs  ┊ GC (mean ± σ):  1.97% ±  1.39%

  ▄▇█▇▆▅▂      ▁▁                                             ▂
  ███████▆▆▅▄▅████▇▇▇▅▅▄▄▁▃▃▃▁▁▁▃▃▃▄▃▁▁▃▁▁▁▁▁▁▃▁▁▁▃▁▃▃▁▁▁▁▃▄▅ █
  23.6 μs      Histogram: log(frequency) by time      40.3 μs <

 Memory estimate: 11.41 KiB, allocs estimate: 244.

n = 7

BenchmarkTools.Trial: 868 samples with 1 evaluation per sample.
 Range (min … max):  5.506 ms …   8.911 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.669 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.759 ms ± 358.240 μs  ┊ GC (mean ± σ):  0.62% ± 2.69%

  ▇█▅▃▃▇▆▂▂▃▄▆ ▁
  ███████████████▇█▆▇▆▆▆▆▆▆▇▅▆▅▄▅▅▆▅▇▄▆▅▅▆▆▆▅▄▅▄▁▄▄▁▁▄▄▄▁▄▁▁▅ █
  5.51 ms      Histogram: log(frequency) by time       7.2 ms <

 Memory estimate: 551.42 KiB, allocs estimate: 10084.

n = 8

BenchmarkTools.Trial: 48 samples with 1 evaluation per sample.
 Range (min … max):  104.423 ms … 113.496 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     105.693 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   106.403 ms ±   2.000 ms  ┊ GC (mean ± σ):  0.26% ± 0.49%

     ▁▆█▃       ▃
  ▄▇▄████▇▁▇▇▁▇▇█▇▁▁▁▄▁▁▇▄▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▄ ▁
  104 ms           Histogram: frequency by time          113 ms <

 Memory estimate: 4.92 MiB, allocs estimate: 80644.

n = 9

BenchmarkTools.Trial: 3 samples with 1 evaluation per sample.
 Range (min … max):  2.417 s …  2.434 s  ┊ GC (min … max): 0.15% … 0.07%
 Time  (median):     2.426 s             ┊ GC (median):    0.07%
 Time  (mean ± σ):   2.426 s ± 8.379 ms  ┊ GC (mean ± σ):  0.09% ± 0.05%

  █                             █                        █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.42 s        Histogram: frequency by time        2.43 s <

 Memory estimate: 44.30 MiB, allocs estimate: 725764.

n = 10, t = 6

BenchmarkTools.Trial: 359 samples with 1 evaluation per sample.
 Range (min … max):  12.538 ms …  19.272 ms  ┊ GC (min … max): 0.00% … 10.51%
 Time  (median):     13.739 ms               ┊ GC (median):    6.39%
 Time  (mean ± σ):   13.928 ms ± 751.616 μs  ┊ GC (mean ± σ):  7.78% ±  3.03%

           ▁▆██▇▅  ▁  ▁▃▄▁  ▁
  ▂▂▃▄▂▁▂▁▆██████▇▇█▅▆████▆▆█▆▃▂▃▃▂▂▁▂▂▂▂▁▁▃▁▁▁▂▂▁▁▁▁▂▁▁▂▁▁▁▁▂ ▃
  12.5 ms         Histogram: frequency by time         17.1 ms <

 Memory estimate: 16.15 MiB, allocs estimate: 302403.

Discussion

This optimization is a very low hanging fruit and I believe it can be merged without much thought as a short-term nicety to have. See #186 instead for a more substantial improvement, obtained by rewriting permutations in terms of multiset_permutations. A broader discussion of the efficiency of permutations is in #185.

- The construct previously used in the `for` loop is inefficient.
- The loop can exit early as soon as `state[i] <= max` is satisfied.
- All array indexing can be marked `@inbounds`.
Copy link

codecov bot commented May 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.21%. Comparing base (ab33a23) to head (19b6040).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #184   +/-   ##
=======================================
  Coverage   97.21%   97.21%           
=======================================
  Files           8        8           
  Lines         826      827    +1     
=======================================
+ Hits          803      804    +1     
  Misses         23       23           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant