Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add more options to load models at InferenceSession constructor #23940

Open
vpenades opened this issue Mar 7, 2025 · 0 comments
Labels
api:CSharp issues related to the C# API feature request request for unsupported feature or enhancement

Comments

@vpenades
Copy link

vpenades commented Mar 7, 2025

Describe the feature request

Right nowm InferenceSession constructor is able to load models from two sources:

  • string (path to model)
  • byte[] (the actual model)

When using the constructors using Byte[] , the previous steps typically involve loading the model from a Stream, and in many cases also involve a MemoryStream that has a .ToArray() that returns the byte[] array of the loaded file.

The problem is that the .ToArray() of a MemoryStream creates a copy of the loaded file because the internal buffers of memory stream are actually larger.

Certainly it could be possible to load the model straight into a byte[] array provided you know the file length beforehand, but that's not always possible nor reliable when using certain Streams. So the safe approach to load a stream is to copy it to a MemoryStream and then extract the bytes from it.

MemoryStream has a way to avoid creating a copy, which is using TryGetBuffer(); that returns an ArraySegment<Byte> , which is what I think the constructors should use instead of Byte[].

So my request is to add additional constructors to InferenceSession:

 public InferenceSession(ArraySegment<Byte> model);
 public InferenceSession(Stream model);

Describe scenario use case

To be able to load models from sources other than file system path or a straight byte[] array, to avoid creating memory copies.

ArraySegment<Byte> modelBytes;

using(var m = new MemoryStream())
{
   using(var s = await httpClient.GetStreamAsync("model url"))
   {
      s.CopyTo(m);
   }
  
   m.TryGetBuffer(out modelBytes);  // get the buffer without creating a copy
}

session = new InferenceSession(modelBytes);

Indirectly, this will help lower the memory pressure when using large models on devices with little memory.

@vpenades vpenades added the feature request request for unsupported feature or enhancement label Mar 7, 2025
@github-actions github-actions bot added the api:CSharp issues related to the C# API label Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant