Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to support Dependency Injection with ITextEmbeddingGenerator #444

Closed

Conversation

marcominerva
Copy link
Contributor

Motivation and Context (Why the change? What's the scenario?)

See #334 .

@dluc
Copy link
Collaborator

dluc commented May 7, 2024

would like not having to introduce global constants, could you replace constants with nameof(class)?

@marcominerva
Copy link
Contributor Author

Unfortunately, we can't use nameof(class) because we have also generic methods like this:

public IKernelMemoryBuilder AddKeyedSingleton<TService, TImplementation>(object serviceKey)
    where TService : class
    where TImplementation : class, TService
{
    this.Services.AddKeyedSingleton<TService, TImplementation>(serviceKey);
    return this;
}

So in this case, we don't have a class to be used with nameof. Moreover, for example in SimpleVectorDb.cs we have this:

public SimpleVectorDb(
    SimpleVectorDbConfig config,
    [FromKeyedServices(Constants.TextEmbeddingGeneratorKey)] ITextEmbeddingGenerator embeddingGenerator,
    ILogger<SimpleVectorDb>? log = null)
{
    this._embeddingGenerator = embeddingGenerator;

So, again we don't have a class for nameof.

We could use nameof(ITextEmbeddingGenerator) if we change the following code:

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator(
    this IKernelMemoryBuilder builder,
    ITextEmbeddingGenerator service,
    bool useForIngestion = true,
    bool useForRetrieval = true)
{
    service = service ?? throw new ConfigurationException("Memory Builder: the embedding generator instance is NULL");

    if (useForRetrieval)
    {
        builder.AddKeyedSingleton<ITextEmbeddingGenerator>(Constants.TextEmbeddingGeneratorKey, service);
    }

    if (useForIngestion)
    {
        builder.AddIngestionEmbeddingGenerator(service);
    }

    return builder;
}

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator<T>(
    this IKernelMemoryBuilder builder,
    bool useForIngestion = true,
    bool useForRetrieval = true) where T : class, ITextEmbeddingGenerator
{
    if (useForRetrieval)
    {
        builder.AddKeyedSingleton<ITextEmbeddingGenerator, T>(Constants.TextEmbeddingGeneratorKey);
    }

    if (useForIngestion)
    {
        builder.AddIngestionEmbeddingGenerator<T>();
    }

    return builder;
}

So that it directly uses the ServiceCollection from the builder:

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator(
    this IKernelMemoryBuilder builder,
    ITextEmbeddingGenerator service,
    bool useForIngestion = true,
    bool useForRetrieval = true)
{
    service = service ?? throw new ConfigurationException("Memory Builder: the embedding generator instance is NULL");

    if (useForRetrieval)
    {
        builder.Services.AddKeyedSingleton<ITextEmbeddingGenerator>(nameof(ITextEmbeddingGenerator), service);
    }

    if (useForIngestion)
    {
        builder.AddIngestionEmbeddingGenerator(service);
    }

    return builder;
}

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator<T>(
    this IKernelMemoryBuilder builder,
    bool useForIngestion = true,
    bool useForRetrieval = true) where T : class, ITextEmbeddingGenerator
{
    if (useForRetrieval)
    {
        builder.Services.AddKeyedSingleton<ITextEmbeddingGenerator, T>(nameof(ITextEmbeddingGenerator));
    }

    if (useForIngestion)
    {
        builder.AddIngestionEmbeddingGenerator<T>();
    }

    return builder;
}

@dluc
Copy link
Collaborator

dluc commented May 7, 2024

Using the same Constants.TextEmbeddingGeneratorKey in all IMemoryDb classes, doesn't it defeat the purpose of allowing different dependencies?

For instance, when using both SimpleVectordb and QdrantMemory, the user should be allowed to inject different embedding generators. Shouldn't the two classes should use different constants, such as nameof(SimpleVectordb) and nameof(QdrantMemoryDb)?

Example:

public SimpleVectorDb(
    SimpleVectorDbConfig config,
    [FromKeyedServices(nameof(SimpleVectorDb))] ITextEmbeddingGenerator embeddingGenerator,
    ILogger<SimpleVectorDb>? log = null)
{
    this._embeddingGenerator = embeddingGenerator;
public QdrantMemory(
    QdrantConfig config,
    [FromKeyedServices(nameof(QdrantMemory))] ITextEmbeddingGenerator embeddingGenerator,
    ILogger<QdrantMemory>? log = null)
builder.WithKeyedSingleton<ITextEmbeddingGenerator, OpenAITextEmbeddingGenerator>(nameof(SimpleVectordb));

builder.WithKeyedSingleton<ITextEmbeddingGenerator, AzureOpenAITextEmbeddingGenerator>(nameof(QdrantMemory));

If this works, then we could introduce some specific extension methods, allowing this syntax, similarly to IHttpClientFactory:

builder.WithKeyedEmbeddingGenerator<SimpleVectordb, OpenAITextEmbeddingGenerator>();

builder.WithKeyedEmbeddingGenerator<QdrantMemory, AzureOpenAITextEmbeddingGenerator>();

If all of this works, we might as well revisit WithCustomEmbeddingGenerator<T>, and consider:

// replaces current WithCustomEmbeddingGenerator<T>
WithEmbeddingGenerator<TEmbeddingGenerator>(...)

// keyed injection
WithEmbeddingGenerator<TConsumer, TEmbeddingGenerator>(...)

@marcominerva
Copy link
Contributor Author

Using the same Constants.TextEmbeddingGeneratorKey in all IMemoryDb classes, doesn't it defeat the purpose of allowing different dependencies?

In current implementation, for example here:

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator(
this IKernelMemoryBuilder builder,
ITextEmbeddingGenerator service,
bool useForIngestion = true,
bool useForRetrieval = true)
{
service = service ?? throw new ConfigurationException("Memory Builder: the embedding generator instance is NULL");
if (useForRetrieval)
{
builder.AddSingleton<ITextEmbeddingGenerator>(service);
}
if (useForIngestion)
{
builder.AddIngestionEmbeddingGenerator(service);
}
return builder;
}

We can register multiple generators for ingestion (builder.AddIngestionEmbeddingGenerator(service)), but only an ITextEmbeddingGenerator (builder.AddSingleton<ITextEmbeddingGenerator>(service)) for retrieval. If we call this method multiple times, then Memory Dbs like this:

public SimpleVectorDb(
SimpleVectorDbConfig config,
ITextEmbeddingGenerator embeddingGenerator,
ILogger<SimpleVectorDb>? log = null)
{
this._embeddingGenerator = embeddingGenerator;

Will get the last ITextEmbeddingGenerator that has been registered for retrieval (this is the default behavior of Dependency Injection).

In my draft, I use the single Constants.TextEmbeddingGeneratorKey constant to identify the same last instance of ITextEmbeddingGenerator:

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator(
        this IKernelMemoryBuilder builder,
        ITextEmbeddingGenerator service,
        bool useForIngestion = true,
        bool useForRetrieval = true)
    {
        service = service ?? throw new ConfigurationException("Memory Builder: the embedding generator instance is NULL");

        if (useForRetrieval)
        {
            builder.AddKeyedSingleton<ITextEmbeddingGenerator>(Constants.TextEmbeddingGeneratorKey, service);
        }

        if (useForIngestion)
        {
            builder.AddIngestionEmbeddingGenerator(service);
        }

        return builder;
    }

In your example:

public SimpleVectorDb(
    SimpleVectorDbConfig config,
    [FromKeyedServices(nameof(SimpleVectorDb))] ITextEmbeddingGenerator embeddingGenerator,
    ILogger<SimpleVectorDb>? log = null)
{
    this._embeddingGenerator = embeddingGenerator;
public QdrantMemory(
    QdrantConfig config,
    [FromKeyedServices(nameof(QdrantMemory))] ITextEmbeddingGenerator embeddingGenerator,
    ILogger<QdrantMemory>? log = null)

SimpleVectorDb and QdrantMemory are tied to a ITextEmbeddingGenerator has been registered with the corresponding class name.

But what if I want to register a custom embedding generator, so that I can't use any nameof operator, because I'm working with a generic type? In that case, what value should I use in the FromKeyedServices attribute?

@dluc
Copy link
Collaborator

dluc commented May 10, 2024

SimpleVectorDb and QdrantMemory are tied to a ITextEmbeddingGenerator has been registered with the corresponding class name.

They are tied only if there's a keyed service, e.g. this

builder.WithSingleton<ITextEmbeddingGenerator, FooBarTextEmbeddingGenerator>();
builder.WithKeyedSingleton<ITextEmbeddingGenerator, OpenAITextEmbeddingGenerator>(nameof(SimpleVectordb));
builder.WithKeyedSingleton<ITextEmbeddingGenerator, AzureOpenAITextEmbeddingGenerator>(nameof(QdrantMemory));

should translate to:

  • SimpleVectordb will use OpenAITextEmbeddingGenerator
  • QdrantMemory will use AzureOpenAITextEmbeddingGenerator
  • PostgresMemory will use FooBarTextEmbeddingGenerator
  • AzureAISearchMemory will use FooBarTextEmbeddingGenerator

But what if I want to register a custom embedding generator, so that I can't use any nameof operator, because I'm working with a generic type? In that case, what value should I use in the FromKeyedServices attribute?

Using nameof(), it's possible to key on one specific T, but not on all T's

class Foo<T> { 
    Foo([FromKeyedServices(nameof(Foo<T>))] IDependency dependency`) {}
}

builder.WithKeyedSingleton<IDependency, ImplOne>(nameof(Foo<Case1>));
builder.WithKeyedSingleton<IDependency, ImplTwo>(nameof(Foo<Case2>));

Using a constant, it's possible to key on all T's, but not one a specific T.

class Foo<T> { 
    Foo([FromKeyedServices(CONSTANT)] IDependency dependency`) {}
}

builder.WithKeyedSingleton<IDependency, Impl>(CONSTANT);

Using nameof should not exclude the use of constants, allowing to use constants if there's a scenario where nameof doesn't work. I don't see a reason for always using constants.

@marcominerva
Copy link
Contributor Author

Excuse me, but I don't get the point. nameof(Foo<T>) is always resolved as Foo, no matter the value of T, as you can verify with the following example:

var fooOfInt = new Foo<int>();
fooOfInt.Print();

var fooOfString = new Foo<string>();
fooOfString.Print();

public class Foo<T>
{
    public void Print()
    {
        var name = nameof(Foo<T>);
        Console.WriteLine(name);
    }
}

Suppose I have the following custom embedding generators:

public class CustomEmbeddingGenerator1 : ITextEmbeddingGenerator
{
    // ...
}

public class CustomEmbeddingGenerator2 : ITextEmbeddingGenerator
{
    // ...
}

And I want to use CustomEmbeddingGenerator1 for QdrantMemory and CustomEmbeddingGenerator2 for AzureAISearchMemory . How should services registration and memory constructors be defined to support this scenario?

@dluc
Copy link
Collaborator

dluc commented May 11, 2024

Let's leave generics out for a second. A couple observations from my tests (console app) and the PR:

  1. the problem raised in [Feature Request] Make WithCustomEmbeddingGenerator accepts type instead of concrete class #334 doesn't require keyed services, it can be addressed using ActivatorUtilities, I posted a comment there.
  2. adding [FromKeyedServices(constant)] to ctors forces to register dependencies with AddKeyed..., increasing code complexity. Fallback to a dependency registered with Add... (without "Keyed") doesn't work: System.InvalidOperationException: Unable to resolve service for type .... Without an automatic fallback, a custom factory with fallback resolution logic would be better, something like IHttpFactory typed clients.
  3. using [FromKeyedServices(Constants.TextEmbeddingGeneratorKey) on all memory classes, means that it's not possible to inject different embedding generators, e.g. GeneratorOne for Qdrant and GeneratorTwo for Azure AI Search. Without this, I don't see much value in introducing keyed services.

Considering that #334 has a solution as far as I can tell, unless we figure out how to use keyed services with a fallback, I'd leave the code as it is.

@marcominerva
Copy link
Contributor Author

Your idea about using ActivatorUtilities.CreateInstance<T> is quite interesting. What do you think about adding your proposal to KernelMemoryBuilderExtensions.cs?

public static IKernelMemoryBuilder WithCustomEmbeddingGenerator<T>(
           this IKernelMemoryBuilder builder) where T : class, ITextEmbeddingGenerator
{
    builder.Services.AddSingleton<ITextEmbeddingGenerator>(serviceProvider =>
    {
        return ActivatorUtilities.CreateInstance<T>(serviceProvider);
    });

    return builder;
}

@dluc
Copy link
Collaborator

dluc commented May 17, 2024

Should we park this, given that the original problem can be addressed without code changes?

@marcominerva
Copy link
Contributor Author

If you agree, I can modify this PR to include just the extension method I suggested in my previous comment. It would greatly simplify the use case. What do you think?

@marcominerva
Copy link
Contributor Author

@dluc, I have closed this PR and opened a new one (#503) so that it includes only the minimal required changes.

dluc pushed a commit that referenced this pull request May 21, 2024
## Motivation and Context (Why the change? What's the scenario?)

Add support for Dependency Injection with ITextEmbeddingGenerator

See #334. This PR
replaces #444 so that it
excludes unnecessary changes.
sangar-1028 added a commit to sangar-1028/kernel-memory that referenced this pull request Oct 21, 2024
## Motivation and Context (Why the change? What's the scenario?)

Add support for Dependency Injection with ITextEmbeddingGenerator

See microsoft/kernel-memory#334. This PR
replaces microsoft/kernel-memory#444 so that it
excludes unnecessary changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for author Waiting for author to reply or address comments work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants