Skip to content

Latest commit

 

History

History
319 lines (278 loc) · 16.5 KB

opt-list.md

File metadata and controls

319 lines (278 loc) · 16.5 KB

Optimization Passes

DistIL also implements a variety of minor canonicalization and simplification transformations. This list only includes the most significant and interesting ones.

Note: some of the decompiled code samples have been re-formatted for legibility and breviety. Unsafe calls are generated by ILSpy from lifted IL instructions.

Linq Expansion

Rewrites Linq queries into imperative code in bottom-up order. This transform works via pattern matching and can only recognize a predefined set of known calls, which are listed below.

Example

// Original code
[Optimize]
public float Aggregate()
    => _sourceItems
        .Select(x => x.Weight)
        .Where(x => x > 0.0f && x < 1.0f)
        .Aggregate(0.0f, (r, x) => r + (x < 0.5f ? -x : x));

// Decompiled output
[Optimize]
public float Aggregate() {
    List<Item> sourceItems = _sourceItems;
    ref Item reference = ref MemoryMarshal.GetArrayDataReference(sourceItems._items);
    ref Item reference2 = ref Unsafe.Add(ref reference, (uint)sourceItems._size);
    float num = 0f;
    for (; Unsafe.IsAddressLessThan(ref reference, ref reference2); reference = ref Unsafe.Add(ref reference, 1)) {
        float num2 = reference.Weight;
        if (num2 > 0f && num2 < 1f) {
            if (num2 < 0.5f) { num2 = 0f - num2; }
            num += num2;
        }
    }
    return num;
}

Benchmarks

Source

Method Toolchain Mean Error StdDev Ratio
FilterObjects .NET 8.0 11.543 us 0.0517 us 0.0553 us 1.00
FilterObjects .NET 8.0 + DistIL 8.029 us 0.0846 us 0.0905 us 0.70
FirstPredicate .NET 8.0 27.718 us 0.1309 us 0.1400 us 1.00
FirstPredicate .NET 8.0 + DistIL 13.285 us 0.1015 us 0.1042 us 0.48
Aggregate .NET 8.0 26.224 us 0.3023 us 0.3235 us 1.00
Aggregate .NET 8.0 + DistIL 5.815 us 0.1557 us 0.1666 us 0.22
CountLetters .NET 8.0 11.560 us 0.1345 us 0.1439 us 1.00
CountLetters .NET 8.0 + DistIL 3.611 us 0.0121 us 0.0139 us 0.31
RayTracer .NET 8.0 16,953.399 us 250.0090 us 277.8843 us 1.00
RayTracer .NET 8.0 + DistIL 10,651.095 us 83.0209 us 85.2564 us 0.63

Dynamic PGO in .NET 8 closes the gap considerably compared to .NET 7, where these same benchmarks can get up to 2-10x ratio:

Method Toolchain Mean Error StdDev Ratio RatioSD
FilterObjects .NET 7.0 25.789 μs 0.9277 μs 1.0684 μs 1.00 0.00
FilterObjects .NET 7.0 + DistIL 11.986 μs 0.7977 μs 0.9187 μs 0.47 0.04
FirstPredicate .NET 7.0 43.959 μs 1.1397 μs 1.3124 μs 1.00 0.00
FirstPredicate .NET 7.0 + DistIL 12.397 μs 0.3286 μs 0.3784 μs 0.28 0.01
Aggregate .NET 7.0 74.475 μs 3.6544 μs 4.2084 μs 1.00 0.00
Aggregate .NET 7.0 + DistIL 5.971 μs 0.1545 μs 0.1779 μs 0.08 0.00
CountLetters .NET 7.0 40.919 μs 0.9485 μs 1.0924 μs 1.00 0.00
CountLetters .NET 7.0 + DistIL 3.684 μs 0.0377 μs 0.0434 μs 0.09 0.00
RayTracer .NET 7.0 18,766.715 μs 493.8189 μs 568.6825 μs 1.00 0.00
RayTracer .NET 7.0 + DistIL 12,499.714 μs 373.6395 μs 430.2838 μs 0.67 0.03

Features

Supported sources

  • Special cased: T[], List<T>, string, Enumerable.Range()
  • Fallback to any IEnumerable<T>

Supported stages

  • Where, Select, OfType, Cast
  • SelectMany
  • Skip, Take

Supported sinks

  • ToList, ToArray, ToHashSet, ToDictionary
  • Aggregate, Count(predicate)
  • First, FirstOrDefault
  • Any(predicate), All(predicate)
  • Loop enumeration

Caveats

  • Null argument checks are removed
    • Result may throw NullReferenceException instead of ArgumentNullException.
  • List<T> version checks are bypassed
    • Result will never throw InvalidOperationException for concurrent modifications. In the worst case where lists are mutated by different threads, this could lead to buffer over-reads and access violations.

Loop Vectorization

Basic loop vectorizer which works on simple for-loops having no complex branches or instructions.
This transform is currently unstable and will only be applied to methods annotated with [Optimize(TryVectorize = true)].

Examples

// Original
[Optimize(TryVectorize = true)]
public static int CountLetters(string text)
    => text.Count(ch => ch is >= 'A' and <= 'Z');

// Decompiled output
[Optimize(TryVectorize = true)]
public static int CountLetters(string text) {
    ref readonly char reference = ref text.GetPinnableReference();
    ref char reference2 = ref Unsafe.Add(ref reference, text.Length);
    int num = 0;
    for (; (nint)Unsafe.ByteOffset(target: ref reference2, origin: ref reference) >= 32; reference = ref Unsafe.Add(ref reference, 16)) {
        Vector256<ushort> left = Unsafe.ReadUnaligned<Vector256<ushort>>(ref Unsafe.As<char, byte>(ref reference));
        Vector256<ushort> vector = Vector256.GreaterThanOrEqual(left, Vector256.Create((ushort)65));
        num += BitOperations.PopCount(
            (vector & Vector256.LessThanOrEqual(left, Vector256.Create((ushort)90)))
                .AsByte().ExtractMostSignificantBits()
        ) >>> 1;
    }
    for (; Unsafe.IsAddressLessThan(ref reference, ref reference2); reference = ref Unsafe.Add(ref reference, 1)) {
        char c = reference;
        num += ((c >= 'A' && c <= 'Z') ? 1 : 0);
    }
    return num;
}

Basic loops bounded by array/span length are partially supported via the strength-reduction pass. Range and assertion propagation could help enable support for cases involving multiple buffers safely (TODO).

// Original
public static void GenerateInts(Span<int> dest, int x) {
    for (int i = 0; i < dest.Length; i++) {
        dest[i] = Math.Abs(i - 50 * x);
    }
}

// Decompiled output
public static void GenerateInts(Span<int> dest, int x) {
    Span<int> span = dest;
    ref int reference = ref span._reference;
    int length = span.Length;
    int num = length - 7;
    int i;
    for (i = 0; i < num; i += 8) {
        ref int reference2 = ref Unsafe.Add(ref reference, i);
        Vector256<int> vector = Vector256.Create(x) * Vector256.Create(50);
        Unsafe.WriteUnaligned(
            ref Unsafe.As<int, byte>(ref reference2), 
            Vector256.Abs(Vector256.Create(i) + Vector256.Create(0, 1, 2, 3, 4, 5, 6, 7) - vector));
    }
    for (; i < length; i++) {
        Unsafe.Add(ref reference, i) = Math.Abs(i - x * 50);
    }
}

Benchmarks

Source

Method Toolchain Mean Error StdDev Ratio Code Size
CountLetters .NET 8.0 10,702.2 ns 148.71 ns 171.25 ns 1.00 862 B
CountLetters .NET 8.0 + DistIL 355.6 ns 1.16 ns 1.24 ns 0.03 146 B
GenerateInts .NET 8.0 1,844.7 ns 1.90 ns 2.03 ns 1.00 87 B
GenerateInts .NET 8.0 + DistIL 381.0 ns 0.81 ns 0.94 ns 0.21 150 B
GenerateFloats .NET 8.0 3,891.0 ns 17.74 ns 19.72 ns 1.00 415 B
GenerateFloats .NET 8.0 + DistIL 858.0 ns 2.62 ns 2.91 ns 0.22 525 B
NormalizeFloats .NET 8.0 9,392.8 ns 5.23 ns 6.02 ns 1.00 490 B
NormalizeFloats .NET 8.0 + DistIL 913.7 ns 0.73 ns 0.78 ns 0.10 665 B

Features

Supported ops

  • Memory accesses: pointer load/stores (where the address is either an invariant pointer offset by the loop counter ptr[i], or the loop counter itself *ptr ... ptr++).
  • Arithmetic: +, -, *, / (float), &, |, ^, ~
  • Math calls: Abs, Min, Max, Sqrt, Floor, Ceil
  • Comparisons: any relop if used by &, |, ^, cond ? x : y, r += cond
  • Conversions: float<->int
  • Types: any numeric primitive compatible with Vector<T> (byte, int, float, etc.)
  • If-conversion: transform patterns such as cond ? x : y, short if..elses, x && y into branchless code
  • Reductions: +=, *=, &=, |=, ^=, Min, Max, += cond ? 1 : 0

Caveats

  • No legality checks: generated code may be wrong in some circumstances (carried dependencies and aliasing)
  • No support for mixed types. This is particularly problematic for small int types (byte/short), since they're implicitly widened on arithmetic ops.
  • Different behavior for floats and some Math calls:
    • Non-associative float reductions
    • NaNs not propagated in Min, Max
    • Int Abs won't throw on overflow

Pre-size Lists / Array Builder Opts

This pass does two things at once:

  • Pre-sizes lists with Add() calls inside loops with known trip count (canonical for..i and enumerator loops based on ICollections).
  • Elides ToArray() copies/calls for non-escaping lists with known final count

Limitations and Caveats

  • All Add() calls inside the loop must be unconditional (i.e. no if (cond) list.Add(...);)
  • Loops over structs are not yet supported (Spans, ImmutableArrays, etc.)
  • Generated code assumes that the source collection is reporting a correct Count compared to the number of elements returned by its enumerator.

Example

// Original code
public int[] DynamicList_Enumer(IReadOnlyList<int> items) {
    var list = new List<int>();

    foreach (int x in _items) {
        list.Add(x * 3 / 5 + 1);
    }
    return list.ToArray();
}
// Decompiled output
public int[] DynamicList_Enumer(IReadOnlyList<int> items) {
    IReadOnlyList<int> items = _items;
    using IEnumerator<int> enumerator = items.GetEnumerator();
    int[] array = new int[items.Count];
    int num = 0;
    while (enumerator.MoveNext()) {
        array[num] = enumerator.Current * 3 / 5 + 1;
        num++;
    }
    return array;
}

Benchmarks

Source

Method Toolchain Count Mean Error StdDev Ratio Gen0 Allocated Alloc Ratio
DirectArray_Gen .NET 8.0 4 6.823 ns 0.0560 ns 0.0599 ns 1.00 0.0096 40 B 1.00
DirectArray_Gen .NET 8.0 + DistIL 4 6.868 ns 0.0378 ns 0.0435 ns 1.01 0.0096 40 B 1.00
DirectArray_Gen .NET 8.0 1024 3,503.899 ns 3.2749 ns 3.5041 ns 1.00 0.9842 4120 B 1.00
DirectArray_Gen .NET 8.0 + DistIL 1024 3,505.668 ns 4.1747 ns 4.6401 ns 1.00 0.9842 4120 B 1.00
DynamicList_Gen .NET 8.0 4 22.968 ns 0.3063 ns 0.3527 ns 1.00 0.0268 112 B 1.00
DynamicList_Gen .NET 8.0 + DistIL 4 7.051 ns 0.1101 ns 0.1223 ns 0.31 0.0096 40 B 0.36
DynamicList_Gen .NET 8.0 1024 3,981.813 ns 8.0322 ns 8.5943 ns 1.00 2.9984 12544 B 1.00
DynamicList_Gen .NET 8.0 + DistIL 1024 3,525.591 ns 15.4540 ns 15.8701 ns 0.89 0.9842 4120 B 0.33
DynamicList_Enumer .NET 8.0 4 31.736 ns 0.3541 ns 0.3936 ns 1.00 0.0344 144 B 1.00
DynamicList_Enumer .NET 8.0 + DistIL 4 17.866 ns 0.2599 ns 0.2669 ns 0.56 0.0172 72 B 0.50
DynamicList_Enumer .NET 8.0 1024 2,621.338 ns 28.6279 ns 31.8198 ns 1.00 3.0060 12576 B 1.00
DynamicList_Enumer .NET 8.0 + DistIL 1024 1,665.603 ns 14.5762 ns 15.5963 ns 0.63 0.9918 4152 B 0.33

Common Sub-expression Elimination

CSE is implemented via a simple dominator-based Global Value Numbering pass. It supports memory accesses and a limited of calls to well known methods, such as dictionary lookups.

Example

// Original code
TODO

Scalar Replacement

Removes simple non-escaping object allocations by inlining constituent fields into local variables.
An object allocation is considered to be non-escaping if all uses throughout the method are from field related instructions.

Example

// Original code
public static int SROA_ClassEnumerator(int[] array) {
    var iter = new ArrayEnumerator(array);
    int sum = 0;
    foreach (int value in iter) {
        sum += value * 2 + 1;
    }
    return sum;
}
// (Note: Roslyn emits try..finally for non-sealed enumerators, and the
//  current simplification passes can't eliminate the Dispose call.)
sealed class ArrayEnumerator(int[] array, int index = 0) {
    public int Current { get; set; }

    public bool MoveNext() {
        if (index < array.Length) {
            Current = array[index++];
            return true;
        }
        return false;
    }
    public ArrayEnumerator GetEnumerator() => this;
}

// Decompiled output 
public static int SROA_ClassEnumerator(int[] array) {
    int num = 0;
    for (int i = 0; i < array.Length; i++) {
        num += array[i] * 2 + 1;
    }
    return num;
}

Method Inlining

Aggressively inline calls for any non-virtual method defined in the same assembly and originally smaller than 32 IL instructions. Recursive inlines are not currently accounted for, and so this may significantly increase the output assembly size and possibly cause performance regressions if the optimizer is enabled globally.

Inlining of methods accessing private members is supported by disabling runtime access checks via IgnoresAccessChecksToAttribute, which is undocumented but supported since .NET Core 1.0 and by newer versions of Mono.