Skip to content

Commit 390aed8

Browse files
authored
Add precompile statements (#203)
* Add precompile statements new precompilation approach * Make precompilation configurable * Bump v5.1.5 * Add documentation * Use `Zero()` instead of `false` * change to minor version * Rework Strided wrapping * small fixes * Remove `@inline` for `makeblascontractable` * Sprinkle `@constprop :none` * Make bumper extension LTS compatible * Remove docstring * Include Bumper precompilation * Clean up and simplify bumper precompilation * Drop support for x86 tests * Revert "Sprinkle `@constprop :none`" This reverts commit 72f9dd9. * disable Bumper precompilation on pre-1.11 versions * revert strided wrapping * update StridedViews compat * Give up on bumper precompilation for now * Revert tensorfree * Undo bumper changes
1 parent 54cbd9b commit 390aed8

File tree

10 files changed

+176
-11
lines changed

10 files changed

+176
-11
lines changed

.github/workflows/ci.yml

-4
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,6 @@ jobs:
3030
- windows-latest
3131
arch:
3232
- x64
33-
- x86
34-
exclude:
35-
- os: macOS-latest
36-
arch: x86
3733
steps:
3834
- uses: actions/checkout@v4
3935
- uses: julia-actions/setup-julia@v2

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
*.jl.cov
22
*.jl.*.cov
33
*.jl.mem
4-
Manifest.toml
4+
Manifest.toml
5+
LocalPreferences.toml

Project.toml

+6-2
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
name = "TensorOperations"
22
uuid = "6aa20fa7-93e2-5fca-9bc0-fbd0db3c71a2"
33
authors = ["Lukas Devos <[email protected]>", "Maarten Van Damme <[email protected]>", "Jutho Haegeman <[email protected]>"]
4-
version = "5.1.4"
4+
version = "5.2.0"
55

66
[deps]
77
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
88
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
99
LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
1010
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1111
PackageExtensionCompat = "65ce6f38-6b18-4e1d-a461-8949797d7930"
12+
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
13+
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
1214
PtrArrays = "43287f4e-b6f4-7ad1-bb20-aadabca52c3d"
1315
Strided = "5e0ebb24-38b0-5f93-81fe-25c709ecae67"
1416
StridedViews = "4db3bf67-4bd7-4b4e-b153-31dc3fb37143"
@@ -38,10 +40,12 @@ LRUCache = "1"
3840
LinearAlgebra = "1.6"
3941
Logging = "1.6"
4042
PackageExtensionCompat = "1"
43+
PrecompileTools = "1.1"
44+
Preferences = "1.4"
4145
PtrArrays = "1.2"
4246
Random = "1"
4347
Strided = "2.2"
44-
StridedViews = "0.3"
48+
StridedViews = "0.3, 0.4"
4549
Test = "1"
4650
TupleTools = "1.6"
4751
VectorInterface = "0.4.1,0.5"

docs/make.jl

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ makedocs(; modules=[TensorOperations],
1111
"man/interface.md",
1212
"man/backends.md",
1313
"man/autodiff.md",
14-
"man/implementation.md"],
14+
"man/implementation.md",
15+
"man/precompilation.md"],
1516
"Index" => "index/index.md"])
1617

1718
# Documenter can also automatically deploy documentation to gh-pages.

docs/src/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
## Table of contents
66

77
```@contents
8-
Pages = ["index.md", "man/indexnotation.md", "man/functions.md", "man/interface.md", "man/backends.md", "man/autodiff.md", "man/implementation.md"]
8+
Pages = ["index.md", "man/indexnotation.md", "man/functions.md", "man/interface.md", "man/backends.md", "man/autodiff.md", "man/implementation.md", "man/precompilation.md"]
99
Depth = 4
1010
```
1111

docs/src/man/precompilation.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Precompilation
2+
3+
TensorOperations.jl has some support for precompiling commonly called functions.
4+
The guiding philosophy is that often, tensor contractions are (part of) the bottlenecks of typical workflows,
5+
and as such we want to maximize performance. As a result, we are choosing to specialize many functions which
6+
may lead to a rather large time-to-first-execution (TTFX). In order to mitigate this, some of that work can
7+
be moved to precompile-time, avoiding the need to re-compile these specializations for every fresh Julia session.
8+
9+
Nevertheless, TensorOperations is designed to work with a large variety of input types, and simply enumerating
10+
all of these tends to lead to prohibitively large precompilation times, as well as large system images.
11+
Therefore, there is some customization possible to tweak the desired level of precompilation, trading in
12+
faster precompile times for fast TTFX for a wider range of inputs.
13+
14+
!!! compat "TensorOperations v5.2.0"
15+
16+
Precompilation support requires at least TensorOperations v5.2.0.
17+
18+
## Defaults
19+
20+
By default, precompilation is enabled for "tensors" of type `Array{T,N}`, where `T` and `N` range over the following values:
21+
22+
* `T` is either `Float64` or `ComplexF64`
23+
* `tensoradd!` is precompiled up to `N = 5`
24+
* `tensortrace!` is precompiled up to `4` free output indices and `2` pairs of traced indices
25+
* `tensorcontract!` is precompiled up to `3` free output indices on both inputs, and `2` contracted indices
26+
27+
## Custom settings
28+
29+
The default precompilation settings can be tweaked to allow for more or less expansive coverage. This is achieved
30+
through a combination of `PrecompileTools`- and `Preferences`-based functionality.
31+
32+
To disable precompilation altogether, for example during development or when you prefer to have small binaries,
33+
you can *locally* change the `"precompile_workload"` key in the preferences.
34+
35+
```julia
36+
using TensorOperations, Preferences
37+
set_preferences!(TensorOperations, "precompile_workload" => false; force=true)
38+
```
39+
40+
Alternatively, you can keep precompilation enabled, change the settings above through the same machinery, via:
41+
42+
* `"precomple_eltypes"`: a `Vector{String}` that evaluate to the desired values of `T<:Number`
43+
* `"precompile_add_ndims"`: an `Int` to specify the maximum `N` for `tensoradd!`
44+
* `"precompile_trace_ndims"`: a `Vector{Int}` of length 2 to specify the maximal number of free and traced indices for `tensortrace!`.
45+
* `"precompile_contract_ndims"`: a `Vector{Int}` of length 2 to specify the maximal number of free and contracted indices for `tensorcontract!`.
46+
47+
!!! note "Backends"
48+
49+
Currently, there is no support for precompiling methods that do not use the default backend. If this is a
50+
feature you would find useful, feel free to contact us or open an issue.

src/TensorOperations.jl

+2
Original file line numberDiff line numberDiff line change
@@ -77,4 +77,6 @@ function __init__()
7777
@require_extensions
7878
end
7979

80+
include("precompile.jl")
81+
8082
end # module

src/implementation/blascontract.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ function _unsafe_blas_contract!(C::StridedView{T},
8181
return C
8282
end
8383

84-
@inline function makeblascontractable(A, pA, TC, backend, allocator)
84+
function makeblascontractable(A, pA, TC, backend, allocator)
8585
flagA = isblascontractable(A, pA) && eltype(A) == TC
8686
if !flagA
8787
A_ = tensoralloc_add(TC, A, pA, false, Val(true), allocator)

src/implementation/functions.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ See also [`tensorcopy`](@ref) and [`tensoradd!`](@ref)
7979
"""
8080
function tensorcopy!(C, A, pA::Index2Tuple, conjA::Bool=false, α::Number=One(),
8181
backend=DefaultBackend(), allocator=DefaultAllocator())
82-
return tensoradd!(C, A, pA, conjA, α, false, backend, allocator)
82+
return tensoradd!(C, A, pA, conjA, α, Zero(), backend, allocator)
8383
end
8484

8585
# ------------------------------------------------------------------------------------------

src/precompile.jl

+111
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
using PrecompileTools: PrecompileTools
2+
using Preferences: @load_preference
3+
4+
# Validate preferences input
5+
# --------------------------
6+
function validate_precompile_eltypes(eltypes)
7+
eltypes isa Vector{String} ||
8+
throw(ArgumentError("`precompile_eltypes` should be a vector of strings, got $(typeof(eltypes)) instead"))
9+
return map(eltypes) do Tstr
10+
T = eval(Meta.parse(Tstr))
11+
(T isa DataType && T <: Number) ||
12+
error("Invalid precompile_eltypes entry: `$Tstr`")
13+
return T
14+
end
15+
end
16+
17+
function validate_add_ndims(add_ndims)
18+
add_ndims isa Int ||
19+
throw(ArgumentError("`precompile_add_ndims` should be an `Int`, got `$add_ndims`"))
20+
add_ndims 0 || error("Invalid precompile_add_ndims: `$add_ndims`")
21+
return add_ndims
22+
end
23+
24+
function validate_trace_ndims(trace_ndims)
25+
trace_ndims isa Vector{Int} && length(trace_ndims) == 2 ||
26+
throw(ArgumentError("`precompile_trace_ndims` should be a `Vector{Int}` of length 2, got `$trace_ndims`"))
27+
all((0), trace_ndims) || error("Invalid precompile_trace_ndims: `$trace_ndims`")
28+
return trace_ndims
29+
end
30+
31+
function validate_contract_ndims(contract_ndims)
32+
contract_ndims isa Vector{Int} && length(contract_ndims) == 2 ||
33+
throw(ArgumentError("`precompile_contract_ndims` should be a `Vector{Int}` of length 2, got `$contract_ndims`"))
34+
all((0), contract_ndims) ||
35+
error("Invalid precompile_contract_ndims: `$contract_ndims`")
36+
return contract_ndims
37+
end
38+
39+
# Static preferences
40+
# ------------------
41+
const PRECOMPILE_ELTYPES = validate_precompile_eltypes(@load_preference("precompile_eltypes",
42+
["Float64",
43+
"ComplexF64"]))
44+
const PRECOMPILE_ADD_NDIMS = validate_add_ndims(@load_preference("precompile_add_ndims", 5))
45+
const PRECOMPILE_TRACE_NDIMS = validate_trace_ndims(@load_preference("precompile_trace_ndims",
46+
[4, 2]))
47+
const PRECOMPILE_CONTRACT_NDIMS = validate_contract_ndims(@load_preference("precompile_contract_ndims",
48+
[4, 2]))
49+
50+
# Using explicit precompile statements here instead of @compile_workload:
51+
# Actually running the precompilation through PrecompileTools leads to longer compile times
52+
# Keeping the workload_enabled functionality to have the option of disabling precompilation
53+
# in a compatible manner with the rest of the ecosystem
54+
if PrecompileTools.workload_enabled(@__MODULE__)
55+
# tensoradd!
56+
# ----------
57+
for T in PRECOMPILE_ELTYPES
58+
for N in 0:PRECOMPILE_ADD_NDIMS
59+
C = Array{T,N}
60+
A = Array{T,N}
61+
pA = Index2Tuple{N,0}
62+
63+
precompile(tensoradd!, (C, A, pA, Bool, One, Zero))
64+
precompile(tensoradd!, (C, A, pA, Bool, T, Zero))
65+
precompile(tensoradd!, (C, A, pA, Bool, T, T))
66+
67+
precompile(tensoralloc_add, (T, A, pA, Bool, Val{true}))
68+
precompile(tensoralloc_add, (T, A, pA, Bool, Val{false}))
69+
end
70+
end
71+
72+
# tensortrace!
73+
# ------------
74+
for T in PRECOMPILE_ELTYPES
75+
for N1 in 0:PRECOMPILE_TRACE_NDIMS[1], N2 in 0:PRECOMPILE_TRACE_NDIMS[2]
76+
C = Array{T,N1}
77+
A = Array{T,N1 + 2N2}
78+
p = Index2Tuple{N1,0}
79+
q = Index2Tuple{N2,N2}
80+
81+
precompile(tensortrace!, (C, A, p, q, Bool, One, Zero))
82+
precompile(tensortrace!, (C, A, p, q, Bool, T, Zero))
83+
precompile(tensortrace!, (C, A, p, q, Bool, T, T))
84+
85+
# allocation re-uses tensoralloc_add
86+
end
87+
end
88+
89+
# tensorcontract!
90+
# ---------------
91+
for T in PRECOMPILE_ELTYPES
92+
for N1 in 0:PRECOMPILE_CONTRACT_NDIMS[1], N2 in 0:PRECOMPILE_CONTRACT_NDIMS[2],
93+
N3 in 0:PRECOMPILE_CONTRACT_NDIMS[1]
94+
95+
NA = N1 + N2
96+
NB = N2 + N3
97+
NC = N1 + N3
98+
C, A, B = Array{T,NC}, Array{T,NA}, Array{T,NB}
99+
pA = Index2Tuple{N1,N2}
100+
pB = Index2Tuple{N2,N3}
101+
pAB = Index2Tuple{NC,0}
102+
103+
precompile(tensorcontract!, (C, A, pA, Bool, B, pB, Bool, pAB, One, Zero))
104+
precompile(tensorcontract!, (C, A, pA, Bool, B, pB, Bool, pAB, T, Zero))
105+
precompile(tensorcontract!, (C, A, pA, Bool, B, pB, Bool, pAB, T, T))
106+
107+
precompile(tensoralloc_contract, (T, A, pA, Bool, B, pB, Bool, pAB, Val{true}))
108+
precompile(tensoralloc_contract, (T, A, pA, Bool, B, pB, Bool, pAB, Val{false}))
109+
end
110+
end
111+
end

0 commit comments

Comments
 (0)