- Linq Expansion
- Loop Vectorization
- Pre-size Lists / List-Array Builder Opts
- Scalar Replacement
- Common Sub-expression Elimination
- Method Inlining
DistIL also implements a variety of minor canonicalization and simplification transformations. This list only includes the most significant and interesting ones.
Note: some of the decompiled code samples have been re-formatted for legibility and breviety. Unsafe calls are generated by ILSpy from lifted IL instructions.
Rewrites Linq queries into imperative code in bottom-up order. This transform works via pattern matching and can only recognize a predefined set of known calls, which are listed below.
// Original code
[Optimize]
public float Aggregate()
=> _sourceItems
.Select(x => x.Weight)
.Where(x => x > 0.0f && x < 1.0f)
.Aggregate(0.0f, (r, x) => r + (x < 0.5f ? -x : x));
// Decompiled output
[Optimize]
public float Aggregate() {
List<Item> sourceItems = _sourceItems;
ref Item reference = ref MemoryMarshal.GetArrayDataReference(sourceItems._items);
ref Item reference2 = ref Unsafe.Add(ref reference, (uint)sourceItems._size);
float num = 0f;
for (; Unsafe.IsAddressLessThan(ref reference, ref reference2); reference = ref Unsafe.Add(ref reference, 1)) {
float num2 = reference.Weight;
if (num2 > 0f && num2 < 1f) {
if (num2 < 0.5f) { num2 = 0f - num2; }
num += num2;
}
}
return num;
}
Method | Toolchain | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
FilterObjects | .NET 8.0 | 11.543 us | 0.0517 us | 0.0553 us | 1.00 |
FilterObjects | .NET 8.0 + DistIL | 8.029 us | 0.0846 us | 0.0905 us | 0.70 |
FirstPredicate | .NET 8.0 | 27.718 us | 0.1309 us | 0.1400 us | 1.00 |
FirstPredicate | .NET 8.0 + DistIL | 13.285 us | 0.1015 us | 0.1042 us | 0.48 |
Aggregate | .NET 8.0 | 26.224 us | 0.3023 us | 0.3235 us | 1.00 |
Aggregate | .NET 8.0 + DistIL | 5.815 us | 0.1557 us | 0.1666 us | 0.22 |
CountLetters | .NET 8.0 | 11.560 us | 0.1345 us | 0.1439 us | 1.00 |
CountLetters | .NET 8.0 + DistIL | 3.611 us | 0.0121 us | 0.0139 us | 0.31 |
RayTracer | .NET 8.0 | 16,953.399 us | 250.0090 us | 277.8843 us | 1.00 |
RayTracer | .NET 8.0 + DistIL | 10,651.095 us | 83.0209 us | 85.2564 us | 0.63 |
Dynamic PGO in .NET 8 closes the gap considerably compared to .NET 7, where these same benchmarks can get up to 2-10x ratio:
Method | Toolchain | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|---|
FilterObjects | .NET 7.0 | 25.789 μs | 0.9277 μs | 1.0684 μs | 1.00 | 0.00 |
FilterObjects | .NET 7.0 + DistIL | 11.986 μs | 0.7977 μs | 0.9187 μs | 0.47 | 0.04 |
FirstPredicate | .NET 7.0 | 43.959 μs | 1.1397 μs | 1.3124 μs | 1.00 | 0.00 |
FirstPredicate | .NET 7.0 + DistIL | 12.397 μs | 0.3286 μs | 0.3784 μs | 0.28 | 0.01 |
Aggregate | .NET 7.0 | 74.475 μs | 3.6544 μs | 4.2084 μs | 1.00 | 0.00 |
Aggregate | .NET 7.0 + DistIL | 5.971 μs | 0.1545 μs | 0.1779 μs | 0.08 | 0.00 |
CountLetters | .NET 7.0 | 40.919 μs | 0.9485 μs | 1.0924 μs | 1.00 | 0.00 |
CountLetters | .NET 7.0 + DistIL | 3.684 μs | 0.0377 μs | 0.0434 μs | 0.09 | 0.00 |
RayTracer | .NET 7.0 | 18,766.715 μs | 493.8189 μs | 568.6825 μs | 1.00 | 0.00 |
RayTracer | .NET 7.0 + DistIL | 12,499.714 μs | 373.6395 μs | 430.2838 μs | 0.67 | 0.03 |
Supported sources
- Special cased:
T[]
,List<T>
,string
,Enumerable.Range()
- Fallback to any
IEnumerable<T>
Supported stages
Where
,Select
,OfType
,Cast
SelectMany
Skip
,Take
Supported sinks
ToList
,ToArray
,ToHashSet
,ToDictionary
Aggregate
,Count(predicate)
First
,FirstOrDefault
Any(predicate)
,All(predicate)
- Loop enumeration
Caveats
- Null argument checks are removed
- Result may throw
NullReferenceException
instead ofArgumentNullException
.
- Result may throw
List<T>
version checks are bypassed- Result will never throw
InvalidOperationException
for concurrent modifications. In the worst case where lists are mutated by different threads, this could lead to buffer over-reads and access violations.
- Result will never throw
Basic loop vectorizer which works on simple for-loops having no complex branches or instructions.
This transform is currently unstable and will only be applied to methods annotated with [Optimize(TryVectorize = true)]
.
// Original
[Optimize(TryVectorize = true)]
public static int CountLetters(string text)
=> text.Count(ch => ch is >= 'A' and <= 'Z');
// Decompiled output
[Optimize(TryVectorize = true)]
public static int CountLetters(string text) {
ref readonly char reference = ref text.GetPinnableReference();
ref char reference2 = ref Unsafe.Add(ref reference, text.Length);
int num = 0;
for (; (nint)Unsafe.ByteOffset(target: ref reference2, origin: ref reference) >= 32; reference = ref Unsafe.Add(ref reference, 16)) {
Vector256<ushort> left = Unsafe.ReadUnaligned<Vector256<ushort>>(ref Unsafe.As<char, byte>(ref reference));
Vector256<ushort> vector = Vector256.GreaterThanOrEqual(left, Vector256.Create((ushort)65));
num += BitOperations.PopCount(
(vector & Vector256.LessThanOrEqual(left, Vector256.Create((ushort)90)))
.AsByte().ExtractMostSignificantBits()
) >>> 1;
}
for (; Unsafe.IsAddressLessThan(ref reference, ref reference2); reference = ref Unsafe.Add(ref reference, 1)) {
char c = reference;
num += ((c >= 'A' && c <= 'Z') ? 1 : 0);
}
return num;
}
Basic loops bounded by array/span length are partially supported via the strength-reduction pass. Range and assertion propagation could help enable support for cases involving multiple buffers safely (TODO).
// Original
public static void GenerateInts(Span<int> dest, int x) {
for (int i = 0; i < dest.Length; i++) {
dest[i] = Math.Abs(i - 50 * x);
}
}
// Decompiled output
public static void GenerateInts(Span<int> dest, int x) {
Span<int> span = dest;
ref int reference = ref span._reference;
int length = span.Length;
int num = length - 7;
int i;
for (i = 0; i < num; i += 8) {
ref int reference2 = ref Unsafe.Add(ref reference, i);
Vector256<int> vector = Vector256.Create(x) * Vector256.Create(50);
Unsafe.WriteUnaligned(
ref Unsafe.As<int, byte>(ref reference2),
Vector256.Abs(Vector256.Create(i) + Vector256.Create(0, 1, 2, 3, 4, 5, 6, 7) - vector));
}
for (; i < length; i++) {
Unsafe.Add(ref reference, i) = Math.Abs(i - x * 50);
}
}
Method | Toolchain | Mean | Error | StdDev | Ratio | Code Size |
---|---|---|---|---|---|---|
CountLetters | .NET 8.0 | 10,702.2 ns | 148.71 ns | 171.25 ns | 1.00 | 862 B |
CountLetters | .NET 8.0 + DistIL | 355.6 ns | 1.16 ns | 1.24 ns | 0.03 | 146 B |
GenerateInts | .NET 8.0 | 1,844.7 ns | 1.90 ns | 2.03 ns | 1.00 | 87 B |
GenerateInts | .NET 8.0 + DistIL | 381.0 ns | 0.81 ns | 0.94 ns | 0.21 | 150 B |
GenerateFloats | .NET 8.0 | 3,891.0 ns | 17.74 ns | 19.72 ns | 1.00 | 415 B |
GenerateFloats | .NET 8.0 + DistIL | 858.0 ns | 2.62 ns | 2.91 ns | 0.22 | 525 B |
NormalizeFloats | .NET 8.0 | 9,392.8 ns | 5.23 ns | 6.02 ns | 1.00 | 490 B |
NormalizeFloats | .NET 8.0 + DistIL | 913.7 ns | 0.73 ns | 0.78 ns | 0.10 | 665 B |
Supported ops
- Memory accesses: pointer load/stores (where the address is either an invariant pointer offset by the loop counter
ptr[i]
, or the loop counter itself*ptr ... ptr++
). - Arithmetic:
+
,-
,*
,/
(float),&
,|
,^
,~
- Math calls:
Abs
,Min
,Max
,Sqrt
,Floor
,Ceil
- Comparisons: any relop if used by
&
,|
,^
,cond ? x : y
,r += cond
- Conversions:
float
<->int
- Types: any numeric primitive compatible with
Vector<T>
(byte, int, float, etc.) - If-conversion: transform patterns such as
cond ? x : y
, shortif..else
s,x && y
into branchless code - Reductions:
+=
,*=
,&=
,|=
,^=
,Min
,Max
,+= cond ? 1 : 0
Caveats
- No legality checks: generated code may be wrong in some circumstances (carried dependencies and aliasing)
- No support for mixed types. This is particularly problematic for small int types (byte/short), since they're implicitly widened on arithmetic ops.
- Different behavior for floats and some Math calls:
- Non-associative float reductions
- NaNs not propagated in
Min
,Max
- Int
Abs
won't throw on overflow
This pass does two things at once:
- Pre-sizes lists with Add() calls inside loops with known trip count (canonical for..i and enumerator loops based on
ICollection
s). - Elides ToArray() copies/calls for non-escaping lists with known final count
Limitations and Caveats
- All Add() calls inside the loop must be unconditional (i.e. no
if (cond) list.Add(...);
) - Loops over structs are not yet supported (Spans, ImmutableArrays, etc.)
- Generated code assumes that the source collection is reporting a correct
Count
compared to the number of elements returned by its enumerator.
// Original code
public int[] DynamicList_Enumer(IReadOnlyList<int> items) {
var list = new List<int>();
foreach (int x in _items) {
list.Add(x * 3 / 5 + 1);
}
return list.ToArray();
}
// Decompiled output
public int[] DynamicList_Enumer(IReadOnlyList<int> items) {
IReadOnlyList<int> items = _items;
using IEnumerator<int> enumerator = items.GetEnumerator();
int[] array = new int[items.Count];
int num = 0;
while (enumerator.MoveNext()) {
array[num] = enumerator.Current * 3 / 5 + 1;
num++;
}
return array;
}
Method | Toolchain | Count | Mean | Error | StdDev | Ratio | Gen0 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
DirectArray_Gen | .NET 8.0 | 4 | 6.823 ns | 0.0560 ns | 0.0599 ns | 1.00 | 0.0096 | 40 B | 1.00 |
DirectArray_Gen | .NET 8.0 + DistIL | 4 | 6.868 ns | 0.0378 ns | 0.0435 ns | 1.01 | 0.0096 | 40 B | 1.00 |
DirectArray_Gen | .NET 8.0 | 1024 | 3,503.899 ns | 3.2749 ns | 3.5041 ns | 1.00 | 0.9842 | 4120 B | 1.00 |
DirectArray_Gen | .NET 8.0 + DistIL | 1024 | 3,505.668 ns | 4.1747 ns | 4.6401 ns | 1.00 | 0.9842 | 4120 B | 1.00 |
DynamicList_Gen | .NET 8.0 | 4 | 22.968 ns | 0.3063 ns | 0.3527 ns | 1.00 | 0.0268 | 112 B | 1.00 |
DynamicList_Gen | .NET 8.0 + DistIL | 4 | 7.051 ns | 0.1101 ns | 0.1223 ns | 0.31 | 0.0096 | 40 B | 0.36 |
DynamicList_Gen | .NET 8.0 | 1024 | 3,981.813 ns | 8.0322 ns | 8.5943 ns | 1.00 | 2.9984 | 12544 B | 1.00 |
DynamicList_Gen | .NET 8.0 + DistIL | 1024 | 3,525.591 ns | 15.4540 ns | 15.8701 ns | 0.89 | 0.9842 | 4120 B | 0.33 |
DynamicList_Enumer | .NET 8.0 | 4 | 31.736 ns | 0.3541 ns | 0.3936 ns | 1.00 | 0.0344 | 144 B | 1.00 |
DynamicList_Enumer | .NET 8.0 + DistIL | 4 | 17.866 ns | 0.2599 ns | 0.2669 ns | 0.56 | 0.0172 | 72 B | 0.50 |
DynamicList_Enumer | .NET 8.0 | 1024 | 2,621.338 ns | 28.6279 ns | 31.8198 ns | 1.00 | 3.0060 | 12576 B | 1.00 |
DynamicList_Enumer | .NET 8.0 + DistIL | 1024 | 1,665.603 ns | 14.5762 ns | 15.5963 ns | 0.63 | 0.9918 | 4152 B | 0.33 |
CSE is implemented via a simple dominator-based Global Value Numbering pass. It supports memory accesses and a limited of calls to well known methods, such as dictionary lookups.
// Original code
TODO
Removes simple non-escaping object allocations by inlining constituent fields into local variables.
An object allocation is considered to be non-escaping if all uses throughout the method are from field related instructions.
// Original code
public static int SROA_ClassEnumerator(int[] array) {
var iter = new ArrayEnumerator(array);
int sum = 0;
foreach (int value in iter) {
sum += value * 2 + 1;
}
return sum;
}
// (Note: Roslyn emits try..finally for non-sealed enumerators, and the
// current simplification passes can't eliminate the Dispose call.)
sealed class ArrayEnumerator(int[] array, int index = 0) {
public int Current { get; set; }
public bool MoveNext() {
if (index < array.Length) {
Current = array[index++];
return true;
}
return false;
}
public ArrayEnumerator GetEnumerator() => this;
}
// Decompiled output
public static int SROA_ClassEnumerator(int[] array) {
int num = 0;
for (int i = 0; i < array.Length; i++) {
num += array[i] * 2 + 1;
}
return num;
}
Aggressively inline calls for any non-virtual method defined in the same assembly and originally smaller than 32 IL instructions. Recursive inlines are not currently accounted for, and so this may significantly increase the output assembly size and possibly cause performance regressions if the optimizer is enabled globally.
Inlining of methods accessing private members is supported by disabling runtime access checks via IgnoresAccessChecksToAttribute
, which is undocumented but supported since .NET Core 1.0 and by newer versions of Mono.