Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Expose a way to get either directly to the backing array of an ImmutableArray.Builder, or get it as a span. #111813

Closed
CyrusNajmabadi opened this issue Jan 24, 2025 · 10 comments · Fixed by #112177
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@CyrusNajmabadi
Copy link
Member

Background and motivation

Roslyn has several internal algorithms (including as very hot spots in our core analyzer driver) where we may have data flowing in as either an ImmutableArray<T> (when the data is fixed and known ahead of time), or as an ImmutableArray<T>.Builder (when it's more the result of some scratch computation that we still want to operate on).

Our choices when we run into this have not been attractive. We either need to make duplicate copies of the code (increasing redundancy and possibility of increases bugs), or we need to update the core functions to take an IEnumerable<T> (or some other list-like interface). This is not great as this both boxes the ImmutableArray<T> case, and then has costs later on when we iterate the values.

Ideally, we could write the core logic to operate uniformly. Either on an array+length (or just a span). With ImmutableArray it is possible to get it as a span with .AsSpan, or get to the backing array with ImmutableCollectionsMarshal. However, no such facility exists for IA.Builder. This blocks us from having uniform processing logic that operates on a contiguous sequence of values, which might be coming from either source.

Our request is to provide some mechanism to expose thsi data from an IA.B. There are a few concerns here:

  1. If the array (or span) is exposed, then it might no longer be valid if the IA.B is mutated. As such, this probably makes to sense to add as an API on ImmutableCollectionsMarshap, not on IA.B itself.

  2. This would lock IA.B into being definitely backed by an array. That said, my belief is that that is already a requirement of the IA.B api because it already has a MoveToImmutable method which is guaranteed to work as long as the current capacity matches the count of the builder:

            /// <summary>
            /// Extracts the internal array as an <see cref="ImmutableArray{T}"/> and replaces it
            /// with a zero length array.
            /// </summary>
            /// <exception cref="InvalidOperationException">When <see cref="ImmutableArray{T}.Builder.Count"/> doesn't
            /// equal <see cref="ImmutableArray{T}.Builder.Capacity"/>.</exception>
            public ImmutableArray<T> MoveToImmutable()

This could not be changed (say, to have multiple internal arrays) as then MoveToImmutable would no longer be an O(1) move as promised, and which lots of consumption code depends on.

API Proposal

namespace System.Runtime.InteropServices;

public static class ImmutableCollectionsMarshal
{
        /// <summary>
        /// Gets the underlying <typeparamref name="T"/> array for an input <see cref="ImmutableArray{T}.Builder"/> value.
        /// </summary>
        /// <typeparam name="T">The type of elements in the input <see cref="ImmutableArray{T}.Builder"/> value.</typeparam>
        /// <param name="builder">The input <see cref="ImmutableArray{T}.Builder"/> value to get the underlying <typeparamref name="T"/> array from.</param>
        /// <returns>The underlying <typeparamref name="T"/> array for <paramref name="builder"/>, if present.</returns>
        /// <remarks>
        /// <para>
        /// The contents of the returned array may change as the builder is used.  If the builder decides to move to a new array,
        /// (for example, if a value is added and the builder is currently at max capacity), this array will then point at data no longer
        /// backing the builder.
        /// </para>
        /// </remarks>
        public static T[] AsArray<T>(ImmutableArray<T>.Builder builder)
        {
            return builder._elements;
        }
}

API Usage

void Analyze(ImmutableArray<SyntaxNode> nodes)
    => AnalyzeCore(nodes.AsSpan());

void Analyze(ImmutableArray<SyntaxNode>.Builder nodes)
    => AnalyzeCore(ImmutableCollectionsMarshal.AsArray(nodes).AsSpan(0, nodes.Count));

void AnalyzeCore(ReadOnlySpan<SyntaxNode> nodes);
{
    // ...
}

Alternative Designs

No response

Risks

This forces the internal backing storage of a builder to always be a contiguous array. However, as stated in the proposal above, it seems like this is already a requirement that cannot be violated to begin with. So having one more dependency on this is likely fine.

@CyrusNajmabadi CyrusNajmabadi added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 24, 2025
@CyrusNajmabadi
Copy link
Member Author

Tagging @ToddGrun @stephentoub

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 24, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@colejohnson66
Copy link

colejohnson66 commented Jan 25, 2025

Is there a benefit to getting the array and slicing over a hypothetical AsSpan? If the idea is to avoid “building” and just getting the span directly, doing the slicing yourself could be error-prone.

@CyrusNajmabadi
Copy link
Member Author

Is there a benefit to getting the array

Yes. Being able to use in scenarios where a span isn't viable. like in async code. My example just showed that if we were using a span (as opposed to some other abstraction over a contiguous sequence of elements) it would work with this new api.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jan 27, 2025

Yes. Being able to use in scenarios where a span isn't viable. like in async code.

Would returning Memory<T> or ArraySegment<T> suit your use case? My concern is that the proposed method's similarity and proximity to the existing ImmutableCollectionsMarshal.AsArray(IA<T>) might reinforce the incorrect assumption that the length of the returned array matches the current count of the builder.

This method looks closer to CollectionsMarshal.AsSpan(List<T>) so I think the naming and return type should reflect that.

@CyrusNajmabadi
Copy link
Member Author

If it can return something without allocation that allows uniform processing with the array that an IA returns and is not a ref struct, it's likely ok.

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jan 27, 2025
@eiriktsarpalis eiriktsarpalis removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 27, 2025
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Jan 27, 2025
@CyrusNajmabadi
Copy link
Member Author

Looking at the defs of these, i think ArraySegment most aligns with how i think of things. it's definitely an array, and just represents a chunk within it. So if you wanted to expose thi as ArraySegment, that seems fine to me :)

@eiriktsarpalis
Copy link
Member

Just to clarify, you only need to access the segment corresponding to the current Count and not the entire array, right? Either approach should be equivalent since it's possible to set Count property to a desired value.

@CyrusNajmabadi
Copy link
Member Author

Just to clarify, you only need to access the segment corresponding to the current Count

Correct.

Either approach should be equivalent since it's possible to set Count property to a desired value.

Right. hence why, morally, i'm just fine with this exposing the array and just leaving it to the caller to do the right thing. But i'm ok with any route :)

@bartonjs
Copy link
Member

bartonjs commented Feb 4, 2025

Video

  • The closest equivalents always return bounded data, such as getting a Span from List, or the correctly sized array from ImmutableArray. So the return here shouldn't be the array itself, but Memory, so that the offset and length are conveyed with it.
    • Callers who really want the array can then use the TryGetArray method on Memory.
    • Since it's not returning an array, renamed AsArray to AsMemory
namespace System.Runtime.InteropServices;

public static class ImmutableCollectionsMarshal
{
    public static Memory<T> AsMemory<T>(ImmutableArray<T>.Builder builder);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 4, 2025
@stephentoub stephentoub self-assigned this Feb 5, 2025
@stephentoub stephentoub modified the milestones: Future, 10.0.0 Feb 5, 2025
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants