-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Custom mapping transformer #1406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
using Stopwatch = System.Diagnostics.Stopwatch; | ||
|
||
/// <summary> | ||
/// An ML.NET environment for local execution. | ||
/// </summary> | ||
public sealed class LocalEnvironment : HostEnvironmentBase<LocalEnvironment> | ||
{ | ||
private readonly Func<CompositionContainer> _compositionContainerFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_compositionContainerFactory [](start = 52, length = 28)
yet another reason to make LocalEnvironment
and MLContext
one thing #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was I thought the plan, but I could be wrong, did we change our minds on this?
In reply to: 228673857 [](ancestors = 228673857)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, does this mean that HostEnvironmentBase
is to implement IHostEnvironment
explicitly?
In any case, this might have been your plan originally, but I didn't think of it back then, and I would rather not do it right now in this particular PR.
In reply to: 229093529 [](ancestors = 229093529,228673857)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that is the implication. I think I even brought it up as a natural consequence, but maybe I forgot. Certainly this is not the PR, I agree. Just clarifying the status of our long range plans.
In reply to: 229394800 [](ancestors = 229394800,229093529,228673857)
@@ -6,13 +6,16 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.Data | |||
{ | |||
using System.ComponentModel.Composition.Hosting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System [](start = 10, length = 6)
oops #Closed
|
||
/// <summary> | ||
/// Get the MEF composition container. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel like this is not really a sufficient description for what it is for, or what it is used for? #Resolved
: base(RandomUtils.Create(seed), verbose: false, conc) | ||
{ | ||
_compositionContainerFactory = compositionContainerFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_compositionContainerFactory = compositionContainerFactory [](start = 12, length = 58)
So, why a factory? Is there something important about deferred execution of the method, or some reason I have it invoked multiple times? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is invoked multiple times: the user may have added more parts to the
PartCatalog
, so the old composition containers are no longer valid. - My first approach used to expose the catalog itself: this didn't work because I need to explicitly register a particular instance of
MLContext
in the container.
In reply to: 229093844 [](ancestors = 229093844)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh gosh. Well I guess that is invisible to the user, I suppose
In reply to: 229396906 [](ancestors = 229396906,229093844)
@@ -687,5 +689,7 @@ public virtual void PrintMessageNormalized(TextWriter writer, string message, bo | |||
else if (!removeLastNewLine) | |||
writer.WriteLine(); | |||
} | |||
|
|||
public virtual CompositionContainer GetCompositionContainer() => throw this.ExceptNotSupp("Composition not supported by this environment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public virtual CompositionContainer GetCompositionContainer() => throw this.ExceptNotSupp("Composition not supported by this environment"); [](start = 8, length = 139)
In the event that someone hasn't set up a composition via the local environment, my expectation would have been that a call to this method -- after all part of the interface we claim to be implementing -- would not fail, but would instead return some value (an empty container, or even null
), to indicate that no components had been registered. That we add a method to an interface and then in the base class we throw with this exception is I think a bad sign. #Resolved
var data = loader.Read(source); | ||
|
||
var customEst = new CustomMappingEstimator<MyInput, MyOutput>(ML, MyLambda.MyAction, "MyLambda"); | ||
TestEstimatorCore(customEst, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestEstimatorCore [](start = 12, length = 17)
This function is I think the place where the serialization is tested, yes? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -12,6 +12,7 @@ | |||
</ItemGroup> | |||
|
|||
<ItemGroup> | |||
<PackageReference Include="System.ComponentModel.Composition" Version="4.5.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
="4.5.0" [](start = 73, length = 8)
Probably want to make this version in the more global file (same place SystemMemoryVersion
we see below is set). #Resolved
@@ -12,6 +12,7 @@ | |||
</ItemGroup> | |||
|
|||
<ItemGroup> | |||
<PackageReference Include="System.ComponentModel.Composition" Version="4.5.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add this to the nupkg dependencies.
machinelearning/pkg/Microsoft.ML/Microsoft.ML.nupkgproj
Lines 11 to 16 in a55b2a6
<PackageReference Include="Newtonsoft.Json" Version="$(NewtonsoftJsonPackageVersion)" /> | |
<PackageReference Include="System.Reflection.Emit.Lightweight" Version="$(SystemReflectionEmitLightweightPackageVersion)" /> | |
<PackageReference Include="System.Threading.Tasks.Dataflow" Version="$(SystemThreadingTasksDataflowPackageVersion)" /> | |
<PackageReference Include="System.CodeDom" Version="$(SystemCodeDomPackageVersion)" /> | |
<PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" /> | |
<PackageReference Include="System.Collections.Immutable" Version="$(SystemCollectionsImmutableVersion)" /> |
src/Microsoft.ML.Api/MapTransform.cs
Outdated
/// </summary> | ||
/// <typeparam name="TSrc">The type that describes what 'source' columns are consumed from the input <see cref="IDataView"/>.</typeparam> | ||
/// <typeparam name="TDst">The type that describes what new columns are added by this transform.</typeparam> | ||
internal sealed class MapTransform<TSrc, TDst> : LambdaTransformBase, ITransformTemplate, IRowToRowMapper | ||
public sealed class CustomMappingTransformer<TSrc, TDst> : ITransformer, ICanSaveModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the file name be renamed too? That way people can find this class when looking through the file system. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
[Fact] | ||
public void TestCustomTransformer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also have a test that saves a model file with a CustomMappingTransformer, and then reads it back in, and executes it correctly. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what TestEstimatorCore
is doing underneath.
In reply to: 229415366 [](ancestors = 229415366)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestEstimatorCore
doesn't execute the transform and verify that the result is correct, does it?
Also, I think it should be read back in using a separate MLContext instance, to ensure everything gets hooked up correctly.
In reply to: 229416855 [](ancestors = 229416855,229415366)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surely executes the transformer, that's how I found bugs in it. See CheckSame*
calls inside.
As for separate contexts, sure.
In reply to: 229419089 [](ancestors = 229419089,229416855,229415366)
|
||
public static void MyAction(MyInput input, MyOutput output) | ||
{ | ||
output.Together = $"{input.Float1} + {string.Join(", ", input.Float4)}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that asserts this gets executed correctly? I don't see a test that asserts the output string is the expected value. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.ML.Tests.Transformers | ||
{ | ||
public sealed class CustomTransformerTests : TestDataPipeBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I would align the test class name with the name of the class it is testing.
CustomMappingEstimator
and CustomMappingTransformer
. I would call this class CustomMappingTransformerTests
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used CustomMappingTests
since we are not always using Estimator
or Transformer
suffix in test names, and I don't think we should.
In reply to: 229417149 [](ancestors = 229417149)
public override CompositionContainer GetCompositionContainer() | ||
{ | ||
if (_compositionContainerFactory != null) | ||
return _compositionContainerFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to make a new CompositionContainer
every time it is asked for? If so, I think it would make more sense to call this CreateCompositionContainer()
, to inform the caller that this is a new one every time. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any other way, since the user may be adding more components to PartCatalog
over time
In reply to: 229417995 [](ancestors = 229417995)
src/Microsoft.ML.Data/MLContext.cs
Outdated
@@ -61,14 +63,16 @@ public sealed class MLContext : IHostEnvironment | |||
/// </summary> | |||
public Action<string> Log { get; set; } | |||
|
|||
public AggregateCatalog PartCatalog { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want xml summary doc on this public property? #Resolved
public class MyLambda | ||
{ | ||
[Export("MyLambda")] | ||
public ITransformer MyTransformer => new CustomMappingTransformer<MyInput, MyOutput>(ML, MyAction, "MyLambda"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"MyLambda" [](start = 111, length = 10)
So the user has to specify "MyLambda"
a couple places. Once in the attribute, once in the transformer constructor (here where I put the comment), and once again when performing this mapping. It seems like this could be discoverable by reflection, so as to avoid the redundancy? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute is not the only way to inject components (it's just the shortest and the most documented on the web. So, you cannot rely on the attribute as the source of the contract name.
Also note that the codebase that does the training and codebase that does the scoring don't have to be the same. Only the scoring one needs to inject the transformer. It's easier to pass just string contractName
across this potential code boundary than any other information.
In reply to: 229428417 [](ancestors = 229428417)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we could put null
here instead of "MyLambda"
. The test would continue to work, but loaded model will stop being save-able.
In reply to: 229462099 [](ancestors = 229462099,229428417)
src/Microsoft.ML.Data/MLContext.cs
Outdated
@@ -61,14 +63,16 @@ public sealed class MLContext : IHostEnvironment | |||
/// </summary> | |||
public Action<string> Log { get; set; } | |||
|
|||
public AggregateCatalog PartCatalog { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how this design would play in a larger application that already uses MEF. If I was writing the application, would I want my "ML.NET MEF stuff" separate from the rest of my application's "MEF stuff"? Or would I want a unified CompositionContainer
that contained all my components.
Say my CustomMappingTransformer
Imports
a bunch of other stuff in my application. I would need to duplicate all those "parts" into this catalog.
What if we changed the design such that a consumer supplies the CompositionContainer
to the MLContext
instead? They can configure it any way they want. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
I haven't ever done this myself, so I'm not sure on the best practices in this area. The above idea was just me brainstorming - not really a "I believe this is the best way to accomplish this design".
Maybe @weshaggard might have some ideas on what would be the best way to expose this to users? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I like this idea better. Note that I mutate the container sometimes though...
In reply to: 229440195 [](ancestors = 229440195)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You (i.e. ML.NET) shouldn't have to mutate the container, should it? If the user needs/wants to [Import] MLContext
, then they can add that to the container themselves, can't they? (or am I missing something?) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are free to do this if they want. I thought it's nice not to have to do it though. Essentially, they will ALWAYS need an instance of MLContext
to instantiate any ITransformer
, and if we were to do it for them optionally, I think it should be fine?
In reply to: 229446216 [](ancestors = 229446216)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good for now. I think we will need to improve the story a little bit though, since right now it is somewhat awkward to declare these things, esp. with multiple declarations of the signature the user could easily get wrong. (But these of course could be discovered via reflection I am pretty sure.)
@Zruty0 and I had an offline discussion regarding "Is it OK for ML.NET to take a direct dependency on For background, these are the currently available DI systems in the .NET ecosystem that I'm aware of (note - not exhaustive): The conclusion we came to was that we will want to continue using Dependency Injection for more component types in the future (e.g. a custom loss function), and creating our own abstraction might be "yet-another-DI-system". So for now, we will pick MEF v1 to go with, and get customer feedback on whether this was the right decision or not. With the current design, it exposes enough for any user to be able to shim their DI system into the CompositionContainer, since the user provides the container themselves. So no matter what DI system an application uses, it should be able to interop with this design. |
using (var env = new LocalEnvironment() | ||
.AddStandardComponents()) | ||
var env = new MLContext().AddStandardComponents(); | ||
env.Log += System.Console.WriteLine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
catch (Exception) | ||
{ | ||
// REVIEW: we should have a common mechanism that will make sure this is 'our' exception thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does Assert.Throws
work here?
return new MapTransform<TSrc, TDst>(env, source, mapAction, saveAction, loadFunc, | ||
inputSchemaDefinition, outputSchemaDefinition); | ||
var composition = env.GetCompositionContainer(); | ||
ITransformer transformer = composition.GetExportedValue<ITransformer>(contractName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env.GetCompositionContainer();
can return null
(MLContext
does by default). This will throw a NRE. We should check for a null container before trying to reference it.
First version of custom estimator, with MEF injection
Also some minor fixes