Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: DefaultInterpolatedStringHandler - expose buffer and reset #110505

Open
mgravell opened this issue Dec 7, 2024 · 30 comments · May be fixed by #112171
Open

[API Proposal]: DefaultInterpolatedStringHandler - expose buffer and reset #110505

mgravell opened this issue Dec 7, 2024 · 30 comments · May be fixed by #112171
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@mgravell
Copy link
Member

mgravell commented Dec 7, 2024

Background and motivation

Cross-reference #110504

In APIs where it is desirable to efficiently pass in interpolated string values, for example as keys, there is not currently a mechanism to access a span-based implementation. DefaultInterpolatedStringHandler does everything required for such scenarios, but after append: only ToStringAndClear is exposed.

API Proposal

source

  public ref struct DefaultInterpolatedStringHandler
  {
-     internal void Clear() {...}
+     public void Clear() {...}

-     internal ReadOnlySpan<char> Text => ...
+     public ReadOnlySpan<char> Text => ...

+     // alternative suggestion for Text, to avoid locking implementation
+     public bool TryGetSpan(out ReadOnlySpan<char> value)
+     {
+        value = Text;
+        return true;
+     }
  }

API Usage

var key = (id: 42, region: "abc");
obj.SomeMethodTakingKey($"abc {key.id:000} in {key.region}");

    public void SomeMethodTakingKey(ref DefaultInterpolatedStringHandler key)
    {
       DoSomething(key.Text); // or key.TryGetSpan
       key.Clear();
    }

Alternative Designs

  • Manually implement your own interpolated string handler, duplicating all the logic from DefaultInterpolatedStringHandler
  • Wrap a DefaultInterpolatedStringHandler, forwarding all the append methods, and use unsafe-accessor etc to invoke the internal APIs

Risks

If the simple Text approach is used, it locks the implementation to a single buffer (not ReadOnlySequence<char> or similar); TryGetSpan leaves that open, with ToStringAndClear being a viable fallback.

Failure to call Clear could leak arrays (not a hard leak; the GC will still catch them, but: not ideal); that sounds like a "use the API correctly" problem; alternatively, perhaps the C# compiler itself could invoke the Clear() method (once it is public, and possibly by looking for an attribute to be sure) after the invoke, as a fallback, so that this is mitigated.

@mgravell mgravell added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 7, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 7, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 7, 2024
@mgravell
Copy link
Member Author

mgravell commented Dec 7, 2024

/cc @stephentoub from discussion on #110504

@colejohnson66
Copy link

Related: #110427

@mgravell
Copy link
Member Author

mgravell commented Dec 7, 2024

lol, my team are troublemakers; good to see that both me and @BrennanConroy saw the problem re Clear(), and had similar thoughts

also cross-ref #55390 which was previously closed; one of the objections there was the "exposing Text constrains the implementation" - I think the TryGetSpan approach helps make that more acceptable.

@mgravell
Copy link
Member Author

mgravell commented Dec 7, 2024

Radical additional step, which would still require this API change; consider:

using System;

var key = (id: 42, region: "abc");
var cache = new SomeType();
cache.DoThing($"abc {key.id:000} in {key.region}");

class SomeType
{
    public void DoThing(string key) => throw new NotSupportedException();

    public void DoThing(ReadOnlySpan<char> key)
    {
        // this is just to show it is working; obvs we don't actually ToString usually
        Console.WriteLine($"Key: {key.ToString()}");
    }
}

At the moment this becomes:

        // ...
        defaultInterpolatedStringHandler.AppendFormatted(valueTuple.Item2);
        someType.DoThing(defaultInterpolatedStringHandler.ToStringAndClear());

Wouldn't it be nice if the ReadOnlySpan<char> API took precedence, and the implementation became:

        // ...
        defaultInterpolatedStringHandler.AppendFormatted(valueTuple.Item2);
        someType.DoThing(defaultInterpolatedStringHandler.Text);
        defaultInterpolatedStringHandler.Clear();

This is simpler for consumers (not requiring a ref DefaultInterpolatedStringHandler parameter), and could kick in immediately for a number of ReadOnlySpan<char> use-cases - but... it is a precedence change. That gets awkward.

Maybe it could apply only to:

  • when there is no competing string overload, or
  • when the span usage is explicit, i.e. cache.DoThing($"abc {key.id:000} in {key.region}".AsSpan());
  • and scoped, if needed

@Sergio0694
Copy link
Contributor

Sergio0694 commented Dec 8, 2024

Isn't this the same as #55390? I tried making the same proposal years ago but it got rejected, there's the API noted at the end of the comments in that issue 🥲

EDIT: nvm I missed the fact you called that out already. Not entirely sure I am fully convinced of how usable this would be in practice though if you were not actually guaranteed to be able to get a span from the handler 🤔

@colejohnson66
Copy link

API review did agree to possibly adding a “try get text” method in the future. This proposal would allow things like this:

try
{
    if (!dish.TryGetSpan(out ReadOnlySpan<char> span))
        span = dish.ToString();
    // use span
}
finally
{
    dish.Clear();
}

@mgravell
Copy link
Member Author

mgravell commented Dec 8, 2024

@colejohnson66 yes, that's exactly what I meant by

TryGetSpan leaves that open, with ToStringAndClear being a viable fallback.

@mgravell
Copy link
Member Author

mgravell commented Dec 8, 2024

An alternative would be to expose the Length and a CopyTo method, allowing the caller to copy the data out, but that seems an additional extra copy

@mgravell
Copy link
Member Author

mgravell commented Dec 8, 2024

@Sergio0694 yes exactly the same - I didn't see yours at the time. I guess this is just "more data points" in terms of "this API is really desirable, and here's concrete examples why", with the "if not" being: we manually copy DefaultInterpolatedStringHandler, or use unsafe-accessor.

@stephentoub
Copy link
Member

Radical additional step, which would still require this API change

This is interesting. @jaredpar, @333fred, timelines aside, is that feasible from a language perspective? Basically allowing an interpolated string to target type to a span, preferred over string, and using an exposed API on the default builder to get the span?

@Tornhoof
Copy link
Contributor

Tornhoof commented Dec 8, 2024

If you modify the string one, can you do the same for the utf8 one?

public ref struct TryWriteInterpolatedStringHandler

The _pos field and _success field are internal, thus

Wrap a DefaultInterpolatedStringHandler, forwarding all the append methods, and use unsafe-accessor etc to invoke the internal APIs

is required there too if you don't want to rewrite your own.

@mgravell
Copy link
Member Author

mgravell commented Dec 9, 2024

@stephentoub assuming it is restricted to "scoped" scenarios (so the span can't be snagged) the biggest problem I can see there is lifetime management - it would effectively need to work like using. Maybe it would be better to handle this in an explicit library way, for example:

using var content = SomeType.Create($"some {interpolated} string {here} with");
return obj.SomeMethodTakingSpan(content);

Where Create takes a ref DefaultInterpolatedStringHandler, and SomeType just wraps that and adds Dispose() => value.Clear(); and an implicit static conversion operator to ReadOnlySpan<char>. Heck, maybe SomeType is DefaultInterpolatedStringHandler?

@mgravell
Copy link
Member Author

mgravell commented Dec 9, 2024

actually, assignment to a string handler type already works; with the addition of .Clear() and .Span, this would work:

using System;
using System.Runtime.CompilerServices;

int id = 42;
string name = "def";
DefaultInterpolatedStringHandler value = $"abc {id} def {name}";
DoSomething(value.Text); // this .Text does not currently work
value.Clear(); // this .Clear() does not currently work

static void DoSomething(ReadOnlySpan<char> value){}

The only question there is: should the public-facing Clear() actually be Dispose(), which would allow:

int id = 42;
string name = "def";
using DefaultInterpolatedStringHandler value = $"abc {id} def {name}";
DoSomething(value.Text);

Demonstration, although notes that it creates a shadow copy of value for the using via sharplab.io

A small part of me wonders whether the above should/could actually compile more like:

        MyStringHandler myStringHandler = new MyStringHandler(9, 2);
+       try
+       {
            myStringHandler.AppendLiteral("abc ");
            myStringHandler.AppendFormatted(value);
            myStringHandler.AppendLiteral(" def ");
            myStringHandler.AppendFormatted(value2);
-           MyStringHandler myStringHandler2 = myStringHandler;
-       try
-       {
-           <<Main>$>g__DoSomething|0_0(myStringHandler2.Text);
+           <<Main>$>g__DoSomething|0_0(myStringHandler.Text);
        }
        finally
        {
-           myStringHandler2.Dispose();
+           myStringHandler.Dispose();
        }

That's probably a question for @jaredpar ;p (although to be fair, IDisposable usually works only after full construction, so: maybe that's fine and I'm over-thinking - simpler example)

@stephentoub
Copy link
Member

assuming it is restricted to "scoped" scenarios (so the span can't be snagged)

Yes. And scoped, whether explicit or much more commonly implicit, is the dominant form with ReadOnlySpan<T> arguments. It'd be fine IMO for call sites that target span arguments to fall back to using string-as-span when the argument might be captured due to the parameter being unscoped.

the biggest problem I can see there is lifetime management - it would effectively need to work like using

The compiler would just emit a call to Clear after the method call. It wouldn't even need to be in a finally block. DefaultInterpolatedStringHandler is called "default" and in CompilerServices because it's there to serve the needs of the C# compiler and C# language specification; it embodies whatever semantics the language needs. All Clear does is return any rented memory to the pool, and in general, we're fine if exceptional paths don't return to the pool (some folks even argue for not doing so).

should the public-facing Clear() actually be Dispose()

Per the above, I don't think so.

@mgravell
Copy link
Member Author

mgravell commented Dec 9, 2024

some folks even argue for not doing so

I've started using that approach almost exclusively; the reason being: a lot of code is async, and if an exception happens in async: we don't know whether the buffer is still being touched by a competing code path, and we're seeing a TCS that has been cancelled by a CT.

@stephentoub
Copy link
Member

and if an exception happens in async: we don't know whether the buffer is still being touched by a competing code path

You don't know that for sync either, e.g. if code does:

public int Compute(byte[] buffer)
{
    Task t1 = Task.Run(() => Compute(buffer.AsSpan(0, buffer.Length / 2));
    Task t2 = Task.Run(() => Compute(buffer.AsSpan(buffer.Length / 2));
    return t1.Result + t2.Result;
}

The point being, it's not really about the signature being sync vs async but rather about whether the implementation uses fork/join patterns. If an async method doesn't fork (the vast vast majority don't), or if it guarantees that all forked work will have quiesced (e.g. Task.WhenAll), then there's no difference in this regard.

But we digress...

@Sergio0694
Copy link
Contributor

"Demonstration, although notes that it creates a shadow copy of value for the using"

This makes me think back of the moveonly proposal we (Jared, Jan, Stephen, Tanner, and I) talked about a while back. Didn't go anywhere because there was not too much interest for it at the time. I wonder if perhaps this could be a good occasion to try to get more traction for it (also considering the recent push for security, and copies being an easy way to introduce security problems when array pooling is involved). More of a side point I suppose, but I'd be happy to ping that thread again and work with folks to write an actual spec to bring to language review, if there's interest for that too now 🙂

@jaredpar
Copy link
Member

jaredpar commented Dec 9, 2024

One item to think about is generally the language doesn't treat parameters special in this way. Instead we'd say that it's a supported conversion in the language that works on locals too. But it already works on locals today and produces locals that are safe to escape where as this proposal is aiming to create values that are not:

// works today
ReadOnlySpan<char> M(string str) => $"hello {str}";

Need to think about how we'd differentiate these cases. Essentially when is it okay to convert an interpolated string that cannot escape. Earlier there was a mention that we could leverage scoped for this. Basically if we're targeting a scoped ReadOnlySpan<char> then use the efficient approach. That might be a viable path forward.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 10, 2024
@mgravell
Copy link
Member Author

Here's a fully worked example of how two proposed tweaks combine to allow complex cache scenarios to use alloc-free code paths, including retroactively when a string API already exists: https://gist.github.com/mgravell/750bed134f36f417f97c217297552a88

@mgravell mgravell added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jan 29, 2025
@stephentoub stephentoub self-assigned this Feb 4, 2025
@stephentoub stephentoub added this to the 10.0.0 milestone Feb 4, 2025
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Feb 4, 2025
@stephentoub stephentoub removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 4, 2025
@bartonjs
Copy link
Member

bartonjs commented Feb 4, 2025

Video

  • We discussed what might happen if we ever changed the implementation to use a paging strategy, and feel that we can make this work.
  • Looks good as proposed.
namespace System.Runtime.CompilerServices;

public ref struct DefaultInterpolatedStringHandler
{
    public void Clear();
    public ReadOnlySpan<char> Text { get; }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 4, 2025
@stephentoub stephentoub linked a pull request Feb 5, 2025 that will close this issue
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Feb 5, 2025
@TrayanZapryanov
Copy link
Contributor

@mgravell Is it an alternative also to build a custom InterpolatedStringHandler with StringBuilderCache as a storage?
Then There is no need of exposing new API and you will receive almost same performance characteristics, but it will be much safer than somebody forget to call Clear() and did not return baking array to the pool? I didn't measure how much slower is the usage of StringBuilderCache against something like ValueStringBuilder, but it is safer in my eyes.

@stephentoub FYI

@mgravell
Copy link
Member Author

mgravell commented Feb 5, 2025

@TrayanZapryanov not really no - if you're going to end up with a string, then $"/foos/{region}/{id}" is already more efficient than StringBuilderCache (via DefaultInterpolatedStringHandler), so adding adding a custom handler would be redundant. The underlying aim here is to allow zero-alloc "hit" fetches from some key cache scenarios, which is failed the moment you have a string (see #110504). You're right in that we need to be careful to use the API correctly to avoid problems, but that's not unusual: compare just ArrayPool.

@TrayanZapryanov
Copy link
Contributor

@mgravell The idea is to get span from StringBuilder - I think this can be used. And then it will be zero alloc too.

@mgravell
Copy link
Member Author

mgravell commented Feb 5, 2025

My gut says "that won't be as efficient", but Eric tells me I should race my horses, so:

| Method                             | Mean     | Error    | StdDev   | Gen0   | Allocated |
|----------------------------------- |---------:|---------:|---------:|-------:|----------:|
| Tuple                              | 11.18 ns | 0.047 ns | 0.041 ns |      - |         - |
| String                             | 35.17 ns | 0.176 ns | 0.156 ns | 0.0033 |      56 B |
| DISH_Span                          | 34.90 ns | 0.203 ns | 0.190 ns |      - |         - |
| StackAlloc_DISH_Span               | 32.61 ns | 0.155 ns | 0.145 ns |      - |         - |
| StackAllocSkipLocalsInit_DISH_Span | 32.31 ns | 0.101 ns | 0.090 ns |      - |         - |
| SBC_Span                           | 34.60 ns | 0.251 ns | 0.235 ns |      - |         - |

where

  • Tuple is using TKey === (string region, int id)
  • String is using TKey === string with $"/foos/{region}/{id}"
  • DISH_Span is using the same data as String via alt-lookup with the API proposed this ticket
  • StackAlloc_DISH_Span is like DISH_Span but using stackalloc to provide initialBuffer instead of leaning on the array-pool
  • StackAllocSkipLocalsInit_DISH_Span is like StackAlloc_DISH_Span but plus [SkipLocalsInit] for the stackalloc
  • SBC_Span is using the suggestion from @TrayanZapryanov

Conclusion: the zero-alloc string-like approaches are all within rounding-error distance of each-other; the alloc-version is also in the same field, but: we'd love to avoid that unnecessary alloc. It feels like this is a natural and useful API for DefaultInterpolatedStringHandler, where-as I've had to hack it in for StringBuilderCache. Additionally:

  • StringBuilderCache isn't a public API; it would be per-consumer
  • you'd need to add and maintain a lot of AppendFormatted overloads on your custom type to retain feature parity with DefaultInterpolatedStringHandler

code for the above benchmark

To pre-empt "but that data says we should just use tuples, why are we discussing this?" - see comment in the bench:

        // note: tuples are great, but aren't as useful as you'd hope for
        // caching; it requires a TKey cache (which doesn't work with IMemoryCache etc),
        // or we pay for boxing - and we'd need a stringify API for IDistributedCache,
        // plus it doesn't benefit from the hash-balancing/re-seeding features of string
        // (hash-poisoning protection)

(and fundamentally: the topic here is where the existing API design is around string-like keys, and we just want to make it more efficient)

@stephentoub
Copy link
Member

Then There is no need of exposing new API

There is no public StringBuilderCache.

but it will be much safer than somebody forget to call Clear() and did not return baking array to the pool

It is not much safer. Depending on the implementation it has either the same concerns or different ones. Not returning an array to the pool is the least of the concerns.

@TrayanZapryanov
Copy link
Contributor

@mgravell Thank you very much for spending time to test my proposal.
I agree that it is not a perfect, just eliminates the need of changing existing interface.
According to the problems that you point in using it :
StringBuilderCache isn't a public API; it would be per-consumer
It is quite simple class and you should be able to mimic it in the handler or just clone and tune it for your use case

you'd need to add and maintain a lot of AppendFormatted overloads on your custom type to retain feature parity with DefaultInterpolatedStringHandler
This is not exactly true as StringBuilder already had a lot of overrides

Image
I guess compiler will find the correct one if you pass formattable value.

Again - I am not against your approach, just an idea that popup in my mind when I saw pending PR.

@stephentoub
Copy link
Member

I guess compiler will find the correct one if you pass formattable value.

No. The compiler requires very specific constructors and named methods.

It is quite simple class

It's not, though. When do you remove a builder from the cache, and if you don't remove it, how do you avoid corruption if the same builder is used by two different consumers at the same time? How do you avoid allocation for larger strings without the cache growing too large, and if you allow it to grow large, how do you avoid that being a process-wide leak? Etc

@mgravell
Copy link
Member Author

mgravell commented Feb 5, 2025

This is not exactly true as StringBuilder already had a lot of overrides

My point is that you'd still need to offer them up to the compiler, though, via AppendFormatted (and AppendLiteral) methods on your wrapper that is marked [InterpolatedStringHandler]. To do that efficiently, with each item optionally taking alignment and format in addition to value: quickly becomes non-trivial.

@TrayanZapryanov
Copy link
Contributor

Thank you both for the information.
As always very helpful.
And sorry for the noise that I made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants