Skip to content

ThreadsX.sort! #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jariji opened this issue Feb 17, 2024 · 3 comments
Open

ThreadsX.sort! #12

jariji opened this issue Feb 17, 2024 · 3 comments

Comments

@jariji
Copy link

jariji commented Feb 17, 2024

Idk if parallel libraries are allowed in this repo but this one seems to do a good job.

using ThreadsX, BenchmarkTools

julia> @btime sort!(xs) setup=(xs=rand(10^6));
  32.477 ms (3 allocations: 7.64 MiB)

julia> @btime ThreadsX.sort!(xs) setup=(xs=rand(10^6));
  7.693 ms (14183 allocations: 18.55 MiB)


julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 3900XT 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 17 on 24 virtual cores
Environment:
  JULIA_NUM_THREADS = 12
@LilithHafner
Copy link
Owner

Parallel libraries are okay. I'd make a note in the legend that it's parallel. However, ThreadsX.sort! seems to have a 2.5-3x overhead vs Base.sort! and my benchmarking machine only has 4 physical cores, so it would not be a particularly good portrail of the algorithm, and I try to only include non-default algorithms if they are substantially better or more popular than the defaults.

I'll leave this open in case I come across a benchmarking machine with more cores or there are enough other threaded algorithms proposed to make a separate chart.

Screenshot from 2024-02-18 12-44-13

Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, generic)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code

@jariji
Copy link
Author

jariji commented Feb 18, 2024

Sounds good.

I'm curious how overhead is measured?

@LilithHafner
Copy link
Owner

LilithHafner commented Feb 18, 2024

I calculated overhead = (runtime of ThreadsX.sort!) * threads / (runtime of base.sort!)

# Measured in ms using -t1, -t2, ..., -t8
# @btime ThreadsX.sort!(xs) setup=(xs=rand(10^6));
threadsx = [24.551
            12.859
            8.818
            6.778
            6.369
            6.107
            5.795
            5.776]

# measured once using -t1
# @btime sort!(xs) setup=(xs=rand(10^6));
default = fill(9.025, 8)
threads = 1:8
speedup = default ./ threadsx
overhead_factor = threadsx .* threads ./ default

using GLMakie

f = Figure()
ax = Axis(f[1, 1], ylabel="runtime in ms", xticks=1:8, xlabel="threads", yticks=0:5:25)
plot!(ax, threads, threadsx, label="threadsx")
lines!(ax, threads, default, label="default")
plot!(ax, [1], [0], marker=:plus)
axislegend(ax)


ax2 = Axis(f[1, 2], ylabel="ratio vs. default", xticks=1:8, xlabel="threads", yticks=0:5)
plot!(ax2, threads, speedup, label="runtime speedup")
plot!(ax2, threads, overhead_factor, label="overhead factor")
plot!(ax2, [1], [0], marker=:plus)
axislegend(ax2, position = :ct)

f

This assumes that there is 100% utilization of every thread the whole time, which is probably close but false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants