Skip to content

MLContext to create them all #1098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Zruty0 opened this issue Sep 28, 2018 · 5 comments
Closed

MLContext to create them all #1098

Zruty0 opened this issue Sep 28, 2018 · 5 comments
Assignees
Labels
API Issues pertaining the friendly API enhancement New feature or request
Milestone

Comments

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 28, 2018

During one of the in-person API reviews we agreed that it would be a good idea to have a single object MLContext that would serve as a 'factory of everything' (similar to the HTTP context / DB context in the .NET world).

  • MLContext will explicitly implement IHostEnvironment, so you can create all the existing estimators by giving the context as the first argument.
  • MLContext will have properties BinaryClassification, Regression, Clustering etc. for canonical ML tasks (the ones that are currently classes in themselves), complete with Evaluate and all corresponding trainers.
  • It will have extension methods for non-canonical tasks like recommendation or anomaly detection etc.
  • It will have properties Transformation, Filtering, Loading to instantiate all known transform estimators, filters and data readers (again via extension methods).
  • It will have a pair of methods SaveModel and LoadModel that handle model serialization.

/cc @KrzysztofCwalina @TomFinley @eerhardt @markusweimer @asthana86

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Sep 29, 2018

MLContext will explicitly implement IHostEnvironment, so you can create all the existing estimators by giving the context as the first argument.

We don't like pass IHostEnvironment objects because they bloated, so let's create even more bloated object.

Can we have something like this instead:

     public sealed class MLContext
    {
        public readonly IHostEnvironment Env;
        public MLContext(IHostEnvironment env = null)
        {
            if (env == null)
                env = new ConsoleEnvironment();
            Env = env;
        }
    }

and just modify constructors in Estimators to accept MLContext?

@TomFinley
Copy link
Contributor

We don't like pass IHostEnvironment objects because they bloated, so let's create even more bloated object.

It's not quite so bad as all that. One of the details here you will see is that @Zruty0 has said, originally suggested by @eerhardt, is that this hypothetical object will explicitly implement the interface. Further, the numerous extensions to IHostEnvironment that are useful only to component authors (most significantly, those in Contracts.cs) would be in a namespace generally used only by component authors, or otherwise rendered invisible to users. On the other hand, the user-facing properties themselves being on the class and not part of the interface, would not be visible to component authors.

So despite being a single object kinda, it has a "dual nature" that reflects the dual usage of ML.NET, on the one hand a tool for exploiting ML by a user, as well as a tool into which one can plug ones own components. All without having to have parallel "user context" vs. "component context" object, which seems like an elegant solution.

Of course, the idea of these property objects raises the specter that we're creating a "factory of everything" object which concerns me somewhat, but I think the pattern works here.

@TomFinley
Copy link
Contributor

"MLContext to create them all,"
One ML.Context to find them.
One ML.Context to tool them all,
and in intellisense bind them.

@asthana86
Copy link
Contributor

asthana86 commented Sep 30, 2018

We touched on this a little during our conversation. Is the logger a type of property as well in addition to BinaryClassification, Regression etc. In EF, The DbContext.Database.Log property can be set to a delegate for any method that takes a string e.g.:

            using (DbContext ctx = new DbContext(myconnectionstring))
            {
                //Regular console app
                ctx.Database.Log = Console.Write;

                //ASP.NET environment
                ctx.Database.Log = message => Trace.WriteLine(message);

                //Write to a log-file
                ctx.Database.Log = message => File.AppendText("C:\\mylog.txt").WriteLine(message);
            }

This then allows the user to choose whether wants to write to console or Trace (in case of ASP.NET) or a file logger. This will also avoid any confusion on what LocalEnvironment, ConsoleEnvironment would mean for our users.

In addition to this being consistent with EF and ASP.NET, which users already know and some consistency with API names which we have now with MLContext will make this a bit more .NETTY.

Just my thoughts.

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 1, 2018

It's a good point @asthana86 . We have AddListener/RemoveListener for this, so it might be a matter of minor massaging of the API.

@shauheen shauheen added this to the 1018 milestone Oct 5, 2018
@shauheen shauheen added the enhancement New feature or request label Oct 5, 2018
@shauheen shauheen added the API Issues pertaining the friendly API label Oct 5, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants