Skip to content

Custom mapping transformer #1406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 31, 2018
Merged

Custom mapping transformer #1406

merged 15 commits into from
Oct 31, 2018

Conversation

Zruty0
Copy link
Contributor

@Zruty0 Zruty0 commented Oct 26, 2018

First version of custom estimator, with MEF injection
Also some minor fixes

@Zruty0 Zruty0 requested a review from TomFinley October 26, 2018 21:44
using Stopwatch = System.Diagnostics.Stopwatch;

/// <summary>
/// An ML.NET environment for local execution.
/// </summary>
public sealed class LocalEnvironment : HostEnvironmentBase<LocalEnvironment>
{
private readonly Func<CompositionContainer> _compositionContainerFactory;
Copy link
Contributor Author

@Zruty0 Zruty0 Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_compositionContainerFactory [](start = 52, length = 28)

yet another reason to make LocalEnvironment and MLContext one thing #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was I thought the plan, but I could be wrong, did we change our minds on this?


In reply to: 228673857 [](ancestors = 228673857)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does this mean that HostEnvironmentBase is to implement IHostEnvironment explicitly?
In any case, this might have been your plan originally, but I didn't think of it back then, and I would rather not do it right now in this particular PR.


In reply to: 229093529 [](ancestors = 229093529,228673857)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that is the implication. I think I even brought it up as a natural consequence, but maybe I forgot. Certainly this is not the PR, I agree. Just clarifying the status of our long range plans.


In reply to: 229394800 [](ancestors = 229394800,229093529,228673857)

@@ -6,13 +6,16 @@

namespace Microsoft.ML.Runtime.Data
{
using System.ComponentModel.Composition.Hosting;
Copy link
Contributor Author

@Zruty0 Zruty0 Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System [](start = 10, length = 6)

oops #Closed


/// <summary>
/// Get the MEF composition container.
/// </summary>
Copy link
Contributor

@TomFinley TomFinley Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel like this is not really a sufficient description for what it is for, or what it is used for? #Resolved

: base(RandomUtils.Create(seed), verbose: false, conc)
{
_compositionContainerFactory = compositionContainerFactory;
Copy link
Contributor

@TomFinley TomFinley Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_compositionContainerFactory = compositionContainerFactory [](start = 12, length = 58)

So, why a factory? Is there something important about deferred execution of the method, or some reason I have it invoked multiple times? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is invoked multiple times: the user may have added more parts to the PartCatalog, so the old composition containers are no longer valid.
  2. My first approach used to expose the catalog itself: this didn't work because I need to explicitly register a particular instance of MLContext in the container.

In reply to: 229093844 [](ancestors = 229093844)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh. Well I guess that is invisible to the user, I suppose


In reply to: 229396906 [](ancestors = 229396906,229093844)

@@ -687,5 +689,7 @@ public virtual void PrintMessageNormalized(TextWriter writer, string message, bo
else if (!removeLastNewLine)
writer.WriteLine();
}

public virtual CompositionContainer GetCompositionContainer() => throw this.ExceptNotSupp("Composition not supported by this environment");
Copy link
Contributor

@TomFinley TomFinley Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public virtual CompositionContainer GetCompositionContainer() => throw this.ExceptNotSupp("Composition not supported by this environment"); [](start = 8, length = 139)

In the event that someone hasn't set up a composition via the local environment, my expectation would have been that a call to this method -- after all part of the interface we claim to be implementing -- would not fail, but would instead return some value (an empty container, or even null), to indicate that no components had been registered. That we add a method to an interface and then in the base class we throw with this exception is I think a bad sign. #Resolved

var data = loader.Read(source);

var customEst = new CustomMappingEstimator<MyInput, MyOutput>(ML, MyLambda.MyAction, "MyLambda");
TestEstimatorCore(customEst, data);
Copy link
Contributor

@TomFinley TomFinley Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEstimatorCore [](start = 12, length = 17)

This function is I think the place where the serialization is tested, yes? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


In reply to: 229096709 [](ancestors = 229096709)

@@ -12,6 +12,7 @@
</ItemGroup>

<ItemGroup>
<PackageReference Include="System.ComponentModel.Composition" Version="4.5.0" />
Copy link
Contributor

@TomFinley TomFinley Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

="4.5.0" [](start = 73, length = 8)

Probably want to make this version in the more global file (same place SystemMemoryVersion we see below is set). #Resolved

@@ -12,6 +12,7 @@
</ItemGroup>

<ItemGroup>
<PackageReference Include="System.ComponentModel.Composition" Version="4.5.0" />
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add this to the nupkg dependencies.

<PackageReference Include="Newtonsoft.Json" Version="$(NewtonsoftJsonPackageVersion)" />
<PackageReference Include="System.Reflection.Emit.Lightweight" Version="$(SystemReflectionEmitLightweightPackageVersion)" />
<PackageReference Include="System.Threading.Tasks.Dataflow" Version="$(SystemThreadingTasksDataflowPackageVersion)" />
<PackageReference Include="System.CodeDom" Version="$(SystemCodeDomPackageVersion)" />
<PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" />
<PackageReference Include="System.Collections.Immutable" Version="$(SystemCollectionsImmutableVersion)" />
#Resolved

/// </summary>
/// <typeparam name="TSrc">The type that describes what 'source' columns are consumed from the input <see cref="IDataView"/>.</typeparam>
/// <typeparam name="TDst">The type that describes what new columns are added by this transform.</typeparam>
internal sealed class MapTransform<TSrc, TDst> : LambdaTransformBase, ITransformTemplate, IRowToRowMapper
public sealed class CustomMappingTransformer<TSrc, TDst> : ITransformer, ICanSaveModel
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the file name be renamed too? That way people can find this class when looking through the file system. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do after all other comments are resolved


In reply to: 229413212 [](ancestors = 229413212)

}

[Fact]
public void TestCustomTransformer()
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also have a test that saves a model file with a CustomMappingTransformer, and then reads it back in, and executes it correctly. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what TestEstimatorCore is doing underneath.


In reply to: 229415366 [](ancestors = 229415366)

Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEstimatorCore doesn't execute the transform and verify that the result is correct, does it?

Also, I think it should be read back in using a separate MLContext instance, to ensure everything gets hooked up correctly.


In reply to: 229416855 [](ancestors = 229416855,229415366)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It surely executes the transformer, that's how I found bugs in it. See CheckSame* calls inside.

As for separate contexts, sure.


In reply to: 229419089 [](ancestors = 229419089,229416855,229415366)


public static void MyAction(MyInput input, MyOutput output)
{
output.Together = $"{input.Float1} + {string.Join(", ", input.Float4)}";
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that asserts this gets executed correctly? I don't see a test that asserts the output string is the expected value. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to assert it.


In reply to: 229416231 [](ancestors = 229416231)


namespace Microsoft.ML.Tests.Transformers
{
public sealed class CustomTransformerTests : TestDataPipeBase
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I would align the test class name with the name of the class it is testing.

CustomMappingEstimator and CustomMappingTransformer. I would call this class CustomMappingTransformerTests. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used CustomMappingTests since we are not always using Estimator or Transformer suffix in test names, and I don't think we should.


In reply to: 229417149 [](ancestors = 229417149)

public override CompositionContainer GetCompositionContainer()
{
if (_compositionContainerFactory != null)
return _compositionContainerFactory();
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to make a new CompositionContainer every time it is asked for? If so, I think it would make more sense to call this CreateCompositionContainer(), to inform the caller that this is a new one every time. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other way, since the user may be adding more components to PartCatalog over time


In reply to: 229417995 [](ancestors = 229417995)

@@ -61,14 +63,16 @@ public sealed class MLContext : IHostEnvironment
/// </summary>
public Action<string> Log { get; set; }

public AggregateCatalog PartCatalog { get; }
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want xml summary doc on this public property? #Resolved

public class MyLambda
{
[Export("MyLambda")]
public ITransformer MyTransformer => new CustomMappingTransformer<MyInput, MyOutput>(ML, MyAction, "MyLambda");
Copy link
Contributor

@TomFinley TomFinley Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MyLambda" [](start = 111, length = 10)

So the user has to specify "MyLambda" a couple places. Once in the attribute, once in the transformer constructor (here where I put the comment), and once again when performing this mapping. It seems like this could be discoverable by reflection, so as to avoid the redundancy? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute is not the only way to inject components (it's just the shortest and the most documented on the web. So, you cannot rely on the attribute as the source of the contract name.

Also note that the codebase that does the training and codebase that does the scoring don't have to be the same. Only the scoring one needs to inject the transformer. It's easier to pass just string contractName across this potential code boundary than any other information.


In reply to: 229428417 [](ancestors = 229428417)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we could put null here instead of "MyLambda". The test would continue to work, but loaded model will stop being save-able.


In reply to: 229462099 [](ancestors = 229462099,229428417)

@@ -61,14 +63,16 @@ public sealed class MLContext : IHostEnvironment
/// </summary>
public Action<string> Log { get; set; }

public AggregateCatalog PartCatalog { get; }
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how this design would play in a larger application that already uses MEF. If I was writing the application, would I want my "ML.NET MEF stuff" separate from the rest of my application's "MEF stuff"? Or would I want a unified CompositionContainer that contained all my components.

Say my CustomMappingTransformer Imports a bunch of other stuff in my application. I would need to duplicate all those "parts" into this catalog.

What if we changed the design such that a consumer supplies the CompositionContainer to the MLContext instead? They can configure it any way they want. #Closed

Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

I haven't ever done this myself, so I'm not sure on the best practices in this area. The above idea was just me brainstorming - not really a "I believe this is the best way to accomplish this design".

Maybe @weshaggard might have some ideas on what would be the best way to expose this to users? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like this idea better. Note that I mutate the container sometimes though...


In reply to: 229440195 [](ancestors = 229440195)

Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You (i.e. ML.NET) shouldn't have to mutate the container, should it? If the user needs/wants to [Import] MLContext, then they can add that to the container themselves, can't they? (or am I missing something?) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are free to do this if they want. I thought it's nice not to have to do it though. Essentially, they will ALWAYS need an instance of MLContext to instantiate any ITransformer, and if we were to do it for them optionally, I think it should be fine?


In reply to: 229446216 [](ancestors = 229446216)

Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me. #Closed

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good for now. I think we will need to improve the story a little bit though, since right now it is somewhat awkward to declare these things, esp. with multiple declarations of the signature the user could easily get wrong. (But these of course could be discovered via reflection I am pretty sure.)

@eerhardt
Copy link
Member

@Zruty0 and I had an offline discussion regarding "Is it OK for ML.NET to take a direct dependency on MEF v1? Or should ML.NET provide its own abstraction so users can bring any DI technology they want?"

For background, these are the currently available DI systems in the .NET ecosystem that I'm aware of (note - not exhaustive):

The conclusion we came to was that we will want to continue using Dependency Injection for more component types in the future (e.g. a custom loss function), and creating our own abstraction might be "yet-another-DI-system". So for now, we will pick MEF v1 to go with, and get customer feedback on whether this was the right decision or not. With the current design, it exposes enough for any user to be able to shim their DI system into the CompositionContainer, since the user provides the container themselves. So no matter what DI system an application uses, it should be able to interop with this design.

@Zruty0 Zruty0 changed the title [WIP] Custom transformer Custom mapping transformer Oct 30, 2018
using (var env = new LocalEnvironment()
.AddStandardComponents())
var env = new MLContext().AddStandardComponents();
env.Log += System.Console.WriteLine;
Copy link
Member

@eerhardt eerhardt Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? #Resolved

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

}
catch (Exception)
{
// REVIEW: we should have a common mechanism that will make sure this is 'our' exception thrown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Assert.Throws work here?

@Zruty0 Zruty0 merged commit a039462 into dotnet:master Oct 31, 2018
@Zruty0 Zruty0 deleted the feature/lambda branch October 31, 2018 17:15
return new MapTransform<TSrc, TDst>(env, source, mapAction, saveAction, loadFunc,
inputSchemaDefinition, outputSchemaDefinition);
var composition = env.GetCompositionContainer();
ITransformer transformer = composition.GetExportedValue<ITransformer>(contractName);
Copy link
Member

@eerhardt eerhardt Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.GetCompositionContainer(); can return null (MLContext does by default). This will throw a NRE. We should check for a null container before trying to reference it.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants