Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yield Performance Improvement suggestion #112183

Open
tjwald opened this issue Feb 5, 2025 · 6 comments
Open

Yield Performance Improvement suggestion #112183

tjwald opened this issue Feb 5, 2025 · 6 comments
Labels
needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners tenet-performance Performance related issue

Comments

@tjwald
Copy link

tjwald commented Feb 5, 2025

Is it possible to improve yield based methods using the same type of optimizations as async v2?

Methods like:

public IEnumerable<int> M() {
    for (int i = 0; i < 10; i++) {
        yield return i;
    }
}

generates a state machine that could make the compiler and JIT have a harder time to optimize the generated code (and maybe stored fields etc.).

This could make Linq based APIs automatically inline and remove the need for manually writing IEnumerables to make it more performant.

@tjwald tjwald added the tenet-performance Performance related issue label Feb 5, 2025
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 5, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 5, 2025
@huoyaoyuan huoyaoyuan closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2025
@huoyaoyuan huoyaoyuan reopened this Feb 5, 2025
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Feb 5, 2025
@huoyaoyuan
Copy link
Member

Although yield and await are both based on CPM state machine, the major use cases are quite different. await often requires true suspension at IO operation, and the state capturing is required. When yield state machines are consumed directly, the state capturing is purely an artifact of separation of producing and consuming. The best optimization should be linking producing and consuming in a sequence.

@tjwald
Copy link
Author

tjwald commented Feb 5, 2025

@huoyaoyuan I understand, but in the previous dotnet version there was a lot of work done in linq to hand-write iterators to improve performance of the different combinations.

Maybe removing the state machine at the compiler level, and moving it to the IL / runtime could make that type of work obsolete since the Compiler / JIT could inline the logic of the chain between each yield to the caller (and maybe the loop body of the caller as well).

@tjwald
Copy link
Author

tjwald commented Feb 5, 2025

I would expect:

public IEnumerable<int> M() {
    int cumSum = 0;
    foreach (var n in Enumerable.Range(0, 100).Where(x => x % 3 != 0).Select(x => x * 2)) {
        cumSum += n;
        yield return cumSum;
    }
}

To generate the equivalent of:

public IEnumerable<int> M() {
    int cumSum = 0;
    for (int i = 0; i < 100; i++) {
          if (i %3 != 0) {
                cumSum += i * 2;
                yield return cumSum;
          }
    }
}

@jakobbotsch
Copy link
Member

For runtime-async we are optimizing based on the assumption that suspensions are infrequent. The trade off is making suspension/resumption cost higher in favor of making the cost of other cases smaller.
For C#'s yield machinery that would correspond to optimizing for the case where the enumerator is going to yield 0 times while making it more expensive to yield more than 0 times. So likely using the same mechanism as runtime-async for yield would not be a good change.

@tjwald
Copy link
Author

tjwald commented Feb 5, 2025

I get it now, thank you. @jakobbotsch .

Ignoring the specifics of the async machinery, moving the yield machinery to the IL level wouldn't allow the JIT to optimize iterator chains so effectively there would be only one Enumerator running with all the logic inlined?

The above code I provided runs between 2 times and 3 times as fast for different sizes on dotnet 9:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();

[MemoryDiagnoser]
public class Benchmarks
{
    [Params(10, 1000, 1_000_000)] public int Size { get; set; }

    [Benchmark]
    public int[] CumSumLinq()
    {
        return CumSumLinqImp(Size).ToArray();
    }
    
    [Benchmark]
    public int[] CumSumHandWritten()
    {
        return CumSumHandWrittenImp(Size).ToArray();
    }

    private static IEnumerable<int> CumSumLinqImp(int size)
    {
        int cumSum = 0;
        foreach (int n in Enumerable.Range(0, size).Where(x => x % 3 != 0).Select(x => x * 2)) {
            cumSum += n;
            yield return cumSum;
        }
    }
    
    private static IEnumerable<int> CumSumHandWrittenImp(int size)
    {
        int cumSum = 0;
        for (int i = 0; i < size; i++)
        {
            if (i % 3 == 0) continue;
            cumSum += i * 2;
            yield return cumSum;
        }
    }
}

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5371/22H2/2022Update)
13th Gen Intel Core i5-13600KF, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100
[Host] : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2
DefaultJob : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

Method Size Mean Error StdDev Gen0 Gen1 Gen2 Allocated
CumSumLinq 10 70.51 ns 1.423 ns 1.331 ns 0.0204 - - 256 B
CumSumHandWritten 10 21.96 ns 0.142 ns 0.133 ns 0.0076 - - 96 B
CumSumLinq 1000 2,840.91 ns 9.840 ns 9.205 ns 0.2289 - - 2896 B
CumSumHandWritten 1000 1,186.42 ns 7.261 ns 6.792 ns 0.2174 - - 2736 B
CumSumLinq 1000000 2,762,385.21 ns 21,074.654 ns 19,713.244 ns 328.1250 328.1250 328.1250 2667112 B
CumSumHandWritten 1000000 1,190,059.97 ns 9,161.726 ns 8,121.632 ns 330.0781 330.0781 330.0781 2666951 B

@huoyaoyuan
Copy link
Member

To generate the equivalent of:

public IEnumerable M() {
int cumSum = 0;
for (int i = 0; i < 100; i++) {
if (i %3 != 0) {
cumSum += i * 2;
yield return cumSum;
}
}
}

This falls under the topic about cost-free linq (dotnet/csharplang#2482).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants