-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a malloca
API that either stackallocs or creates an array.
#52065
Comments
Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop Issue DetailsBackground and MotivationIt is not uncommon, in performance oriented code, to want to Proposed APInamespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static Span<T> Stackalloc(int length);
public static Span<T> StackallocOrCreateArray<T>(int length);
public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength);
}
} These APIs would be public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength)
{
return ((sizeof(T) * length) < maxStackallocLength) ? stackalloc T[length] : new T[length];
} The variant that doesn't take Any
|
This issue came up on Twitter again (https://twitter.com/jaredpar/status/1387798562117873678?s=20) and we have valid use cases in the framework and compiler. This has been somewhat stuck in limbo as runtime/framework saying "we need language support first" and the language saying "we need the runtime/framework to commit to doing this first". We should review and approve this to unblock the language from committing to their work and can do all the appropriate implementation/prep work on the runtime side, without actually making it public until the language feature is available. |
There would not be an API which opts to use the
|
Related to #25423. That proposal is a bit light on concrete APIs, but it suggests behaviors / analyzers / other ecosystem goodness we'd likely want to have around this construct. |
This does require language changes to work correctly but the implementation is very straight forward. The compiler will just treat all of these calls as if they are not safe to escape from the calling method. Effectively it would have the same lifetime limitation as calling I think the best approach is to just have the compiler trigger on the FQN of the method. Essentially any API with this signature in any assembly would be treated this way. That would make it easier to write code that multi-targets between .NET Core and .NET Framework as the framework side of this could be implemented as The other advantage of this API is that w can once again var all the things. var local1 = stackalloc int[42]; // int*
var local2 = Unsafe.StackAlloc<int>(42); // Span<int> |
This is one of those features that requires a joint work from all runtimes/JIT/language/libraries. Our .NET 6 budget for features in this space was taken by the generic numerics. We should include this proposal next time we do planning in this area. Approving this API without the resource commintment won't achieve much. |
It gives us a surface on which this can be implemented given "free time" and can be prioritized appropriately. The library work is approving the API and exposing the surface area. The JIT work should just be implementing it as a recursive named intrinsic and then creating the relevant nodes for: if ((sizeof(T) * length) < maxStackallocLength)
{
var x = stackalloc T[length];
return new Span<T>(x, length);
}
else
{
var x = new T[length];
return new Span<T>(x);
} This is fairly straightforward, except for the |
I do not think we would want to do a naive implementation like this. I think we would want to do explicit life-time tracking even when the lenght is over the threashold. |
What's the scenario where the JIT needs to do additional tracking that isn't already covered by the language rules and by the existing tracking for Users can and already do write the above today, just manually inlined. We are looking at doing exactly this already in one of the |
We would be leaving performance on the table. Majority of the existing stackalloc uses are using ArrayPool as the fallback. If the new API is not using pooled memory as the fallback, the majority of the existing stackalloc sites won't be able to use it. |
That requires a different level of language support. Supporting the non-arraypool case is very straight forward. It's just generalizing the existing lifetime restrictions we associate with The |
That really sounds like an additional ask and one isn't strictly needed at the same time. Pooling has a lot of different considerations and we ourselves largely only use it with a few primitive types (namely
I think its doable, but we could also unblock many scenarios with the above today and with minimal work. |
I do not think we would necessarily want to deal with the pooling in Roslyn, nor have it backed by the ArrayPool as it exist today. |
I do not see those scenarios. The minimal work just lets you do the same thing as what you can do with stackalloc today, just maybe saves you a few characters. |
They exist everywhere that None of the existing proposals or discussions around this, including #25423 which has been around for 3 years, have really covered pooling as that is considered a more advanced scenario. This covers the case of "I want to allocate on the stack for small data and on the heap for larger data" and where the limit for that might vary between platforms and architectures. Windows for example has a 1MB stack by default and uses 1024 bytes. Linux uses a 4MB stack and might want a different limit. Encountering large lengths is typically expected to be rare, but not impossible. Its not unreasonable to simply new up an unpooled array in that scenario. |
Pooling, for example, is likely only beneficial for types like |
Span<byte> span = Unsafe.StackallocOrCreateArray(len, 1024);
// vs
Span<byte> span = len > 1024 ? new byte[len] : stackalloc byte[1024]; Indeed just saves a few characters (but nice to have). But byte[] arrayFromPool = null;
Span<byte> span = len > 1024 ? (arrayFromPool = ArrayPool<byte>.Shared.Rent(len)) : stackalloc byte[1024];
try
{
}
finally
{
if (arrayFromPool != null)
ArrayPool<byte>.Shared.Return(arrayFromPool );
}
// vs
Span<byte> span = Unsafe.StackallocOrPool(len, 1024); |
Has a couple of other benefits:
|
I'm now seeing conflicting advice on whether or not arrays should be returned to the pool in a |
@EgorBo your example with the pool would save even more when the Span is sliced to the desired length (as it's often needed that way when the length is given as argument). |
This is due to current array pool design limitations. This is fixable by treating management of explicit lifetime memory as core runtime feature.
This depends on how performance sensitive your code is and how frequenly you expect exceptions to occur inside the scope. If your code is perf critical (e.g. number formatting) and you do not expect exceptions to ever occur inside the scope (e.g. the only exception you ever expect is out of memory), it is better to avoid finally as it is the common case in dotnet/runtime libraries. |
That also sounds like a feature that is potentially several releases out and which is going to require users and the compiler to review where it is good/correct to use. Something like proposed here is usable in the interim, including for cases like Having to do |
What about making the allocator not necessarily bound to namespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static Span<T> Stackalloc<TAllocator, T>(int length, TAllocator allocator)
where TAllocator: ISpanAllocator<T>
// ...
}
public interface ISpanAllocator<T> {
Span<T> Allocate(int length);
}
} |
Could also allocate a series of ref fields (all null); and then allow indexing them as via Span |
I think any API that isn't tracking either Otherwise, I think it falls into the general camp of what it seems @jkotas is proposing with runtime supported lifetime tracking. |
Oh, yeah true, Let's do it! 😅 |
That starts to be as painful as implementing namespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static TState Stackalloc<TAllocator, TState, T>(int length, TAllocator allocator, out Span<T> span)
where TAllocator: ISpanAllocator<T, TState>
// ...
}
public interface ISpanAllocator<T, TState> {
Span<T> Allocate(int length, out TState state);
void Release(TState state);
}
} [Edit] Removed |
What's the problem with that? Afaik, the GC already scans the stack for references.
There's an easy solution to that: the runtime enforces zero-initializing managed types, ignoring |
This code already works: Buffer b = new();
Console.WriteLine(b[0] is null);
Console.WriteLine(b[1] is null);
Console.WriteLine(b[2] is null);
b[0] = new string("Test");
Console.WriteLine(b[0]);
GC.Collect(2, GCCollectionMode.Default, true);
GC.WaitForPendingFinalizers();
Console.WriteLine(b[0]);
[InlineArray(3)]
struct Buffer
{
private string? _element;
} Why wouldn't it work with shorter syntax like |
You are missing the |
Given my understanding of Run();
Run();
[SkipLocalsInit]
void Run()
{
Buffer b = new();
Console.WriteLine(b[0] is null);
Console.WriteLine(b[1] is null);
Console.WriteLine(b[2] is null);
b[0] = new string("Test");
Console.WriteLine(b[0]);
GC.Collect(2, GCCollectionMode.Default, true);
GC.WaitForPendingFinalizers();
Console.WriteLine(b[0]);
}
[InlineArray(3)]
struct Buffer
{
private string? _element;
} Still gives correct Output. Even though at the second call |
You declared it but aren't using it, calling the constructor initializes, stackalloc doesn't. You'd want to try |
|
Even more curious is why using System;
using System.Runtime.CompilerServices;
public class Class
{
public static void Main()
{
StackAllocatedThing<string> a = default;
a[Random.Shared.Next(42)] = "test";
UseThing(a);
}
private static unsafe void UseThing(Span<string> span)
{
Console.WriteLine(span.Length);
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine($"[{i,2}] = {(nuint)Unsafe.AsPointer(ref span[i]):x8} = {span[i] ?? "(null)"}");
}
}
[InlineArray(42)]
public struct StackAllocatedThing<T>
where T : class
{
private T _element0;
}
} Ideally, |
The inability to use fixed sized buffers of types other than core primitives was a significant blocker for low level scenarios. It's a restriction that goes back to C# 1.0 and a sore point since then. This hit a tipping point a few releases ago, the C# and runtime team collaborated to solve that problem and
That is a reasonable language suggestion. Essentially, create a language feature |
Except a language-level translation won't suffice because stackallocs don't always have a compile-time size. |
It's not that common, in part because it is very expensive and often slower than simply It is then "best practice" to keep stack allocations small (all stackallocs for a single method should typically add up to not more than 1024 bytes) and to never make them "dynamic" in length (instead rounding up to the largest buffer size). This guidance is true even in native code (C, C++, assembly, etc) and not following it can in some cases interfere with or break internal CPU optimizations (such as mirroring stack spills to the register file). |
Dynamic length works well if you reliably know your data source. |
Dynamic lengths function as intended in many scenarios. However, they can lead to various issues including hurting performance and potentially opening yourself up to security problems (even if the data source is known). There are multiple recommendations in this space that are effectively industry standard and they allow you to achieve the same overall thing without introducing the same risks. Those industry standards and recommendations should be considered alongside any API exposed here or future work done by the runtime to enable new scenarios. |
The example from #112178 adds a motivation to introduce the API 🙂 |
@jkotas how would we feel about the following shape (possibly with a better name than namespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static int DefaultStackallocThreshold { get; }
// TODO: Determine which compiler attributes or keywords need to be added for correct lifetime tracking.
[Intrinsic]
public static Span<T> StackallocOrCreateArray<T>(int length) => StackallocOrCreateArray(length, DefaultStackallocThreshold);
[Intrinsic]
public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength) => new T[length];
[Intrinsic]
public static StackallocOrRentedBuffer<T> StackallocOrRentArray<T>(int length) => StackallocOrRentArray(length, DefaultStackallocThreshold);
[Intrinsic]
public static StackallocOrRentedBuffer<T> StackallocOrRentArray<T>(int length, int maxStackallocLength)
{
T[] rentedArray = ArrayPool<T>.Shared.Rent(length);
return new StackallocOrRentedBuffer(rentedArray.AsSpan(0, length), rentedArray);
}
}
public ref struct StackallocOrRentedBuffer<T>
{
private Span<T> _buffer;
private T[]? _rentedArray;
public StackallocOrRentedBuffer(Span<T> buffer, T[]? rentedArray = null)
{
_buffer = buffer;
_rentedArray = rentedArray;
}
public static implicit operator Span<T>(StackallocOrRentedBuffer value) => value._buffer;
public static implicit operator ReadOnlySpan<T>(StackallocOrRentedBuffer value) => value._buffer;
public void Dispose()
{
if (_rentedArray is not null)
{
ArrayPool<T>.Shared.Return(_rentedArray);
_rentedArray = null;
}
}
}
} This:
The biggest complexity here is getting the JIT to "call" |
Such a shape would simplify the example @EgorBo just shared down to: StackallocOrRentedBuffer powersOf1e9Buffer = Unsafe.StackallocOrRent<uint>(powersOf1e9BufferLength);
// ...
powersOf1e9Buffer.Dispose(); |
Another benefit of such APIs would be that the Roslyn compiler could use them instead of us exposing As the JIT could ensure that the GC tracking is correct for a stackallocated reference type |
Non-critical dispose would additionally allow it to be simplified down to this :) using StackallocOrRentedBuffer<uint> powersOf1e9Buffer = Unsafe.StackallocOrRent<uint>(powersOf1e9BufferLength); I generally agree with the idea that they shouldn't be returned in the case of an exception (as you might have it stored somewhere - although with this API design, you could write the lifetimes such that it's not possible I think, it may already be that way, but I haven't checked for sure), and there's the benefit wrt no try/finally for what most developers are likely to type (the Note: this is just a nice-to-have for the most part imo, as opposed to something that I think should block this proposal.
Are we sure we don't want to clear ever? I think at least an overload of dispose with clear as an option would be good, and perhaps the default should be true for managed types also. (e.g., with the current design as written, you might store a reference into it, have an exception thrown before you got around to clearing it, return the array without clearing it in the finally, and then potentially leak the reference as a result)
I don't agree with this usage personally, for 2 reasons: 1. (the trivially solveable one) if we do this, we should also expose an If we were to do something like this, imo it should be reserved for only high item count / total size scenarios, or dynamic scenarios (which would be questionable to do implicitly regardless). Beyond those comments, I personally think the proposal is good :) |
I like the proposed API shape just to simplify what we already have. The question is - can we make it safer? Can it avoid using the thread pool under the hood if it sees that the pooled array escapes outside of the scope (using inter-procedure analysis)? Or at least help developers diagnose issues where it does |
Clearing would require tracking extra state. It could be an option, but its also not something the BCL normally does and it's not the default for If you need to clear, its not difficult to do
The JIT can optimize constant sizes to not emit the fallback path, because it statically knows its under the threshold. This makes it zero cost for anything that isn't dynamic. If it is dynamic, then you want the branch to avoid the potential dangers of stackoverflow. Thus, there is no need for just a |
It could simply pass
And this doesn't run into issues with causing pessimisation over the exiting solution in other dynamic scenarios, like calling a function with |
It should not. For static lengths and/or for the intrinsic API in particular there should be nothing preventing us from hoisting such a case since we know the scoping of it due to the lifetime. There's also nothing preventing Roslyn from doing said hoisting itself since it would already be something implicitly done behind the scenes. There's tradeoffs, its just a suggestion for how this could also benefit that scenario.
Yes, but its tradeoffs and largely irrelevant to the basic shape getting a nod of approval so we can finally take it to API review. We can discuss additional concepts like clearing in API review and decide if a third overload or parameter is warranted. |
It is still an unsafe API with similar problems as ValueStringBuilder. I think we should be shooting for a construct that makes it impossible to introduce a memory safety bug. |
Perhaps something along the lines of ref struct destructor? |
@jkotas could you elaborate a bit on what you'd be expecting the long term solution to be, possible with some pseudo-code? Notably I don't see the potential issue with With the scoped StackallocOrRentedBuffer<T> buffer = Unsafe.StackallocOrRentArray<T>(...);
scoped Span<T> span = buffer;
buffer.Dispose();
span[0] = ...; // mutating a buffer that's been returned to the array pool This issue exists because of how While this danger does exist on the latter, the same danger notably already exists in the paths that would use it just hidden around much more convoluted API surface. So it seems like exposing such an API is still improving safety and would be worth it for power users given the other feature may still be years out. Short of a language feature, I could see the signature being |
Rather than an analyzer I think we'd want to encode this into the compiler. This has come up a few times in the past. Essentially how can the runtime mark an API that is
I'm somewhat warry of taking this entire idea and encoding it as a language feature. Basically I'm hesitant about having |
Yeah I'd love to have something like this but it would probably need even more compiler support using StackAllocOrCustomAllocBuffer<T> buffer = Unsafe.StackAllocOrCustomAlloc<T>(length, static len => MyCustomWin32HeapAllocator(len)); |
Background and Motivation
It is not uncommon, in performance oriented code, to want to
stackalloc
for small/short-lived collections. However, the exact size is not always well known in which case you want to fallback to creating an array instead.Proposed API
These APIs would be
intrinsic
to the JIT and would effectively be implemented as the following, except specially inlined into the function so thelocalloc
scope is that of the calling method:The variant that doesn't take
maxStackallocLength
would use some implementation defined default. Windows currently uses1024
.Any
T
would be allowed and the JIT would simply donew T[length]
for any types that cannot be stack allocated (reference types).The text was updated successfully, but these errors were encountered: