Skip to content

Movement and Internalization Phase 2 #1626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 16, 2018
Merged

Conversation

TomFinley
Copy link
Contributor

@TomFinley TomFinley commented Nov 15, 2018

Continuation of the work of #1587, which itself is part of the ongoing work of #1519.

The changes are certainly best digested commit by commit, rather than just reviewing the last commit. The commmit descriptions are intended to be informative.

Please do not be afraid of the seemingly large line count. Discounting whitespace changes (mostly on account of not using ConsoleEnvironment in many places), the number of actually changed lines is really more like only 600 additions/900 deletions, as opposed to what git is reporting on my screen as on the order of ~4000 of these.

Incidentally fixes #1284.

var env = new ConsoleEnvironment();
return data.AsCursorable<TRow>(env, ignoreMissingColumns, schemaDefinition);
// REVIEW: Take this as a parameter, or else make it a method on he context itself.
var ml = new MLContext(42);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

42 [](start = 35, length = 2)

is the number intentional?

Copy link
Contributor Author

@TomFinley TomFinley Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, some copy-pasta, but surely just making the cursors seed deterministic isn't going to hurt anything I hope. 😄 Still I can get rid of it since it was unintentional, but I wouldn't necessarily prioritize this sort of issue.

If however there had been a seed and I removed it by mistake, that would be something definitely to comment on since I'd be making the random state nondeterministic.

@@ -72,6 +72,8 @@ public interface IHostEnvironment : IChannelProvider, IProgressChannelProvider
/// The suffix and prefix are optional. A common use for suffix is to specify an extension, eg, ".txt".
/// The use of suffix and prefix, including whether they have any affect, is up to the host environment.
/// </summary>
[Obsolete("The host environment is not disposable, so it is inappropriate to use this method. " +
"Handle your own temporary files. If you cannot, reconsider your life choices.")]
Copy link
Member

@sfilipi sfilipi Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot, reconsider your life choices [](start = 46, length = 43)

If you cannot, reconsider your life choices [](start = 46, length = 43)

you sure you want to show this message? People might run into this at 5pm on a Friday, as they are trying to wrap up their week and go home! #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, only component authors will run into it, and AFAIK there is only one component still that actually uses this (as seen in #1287). We are also definitely removing this, so an obsolete method to anyone that might make the mistake of using it in the future is I think appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you mean was I a bit too flippant... yeah a bit. It was meant to be funny but maybe it wasn't. I'll tone it down.

@@ -19,6 +19,7 @@ namespace Microsoft.ML.Runtime
/// delegates will be published in some fashion, with the target scenario being
/// that the library will publish some sort of restful API.
/// </summary>
[BestFriend]
public sealed class ServerChannel : ServerChannel.IPendingBundleNotification, IDisposable
Copy link
Member

@sfilipi sfilipi Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 4, length = 6)

public [](start = 4, length = 6)

did you mean to make this internal? #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, absolutely. #Resolved

using (var env = new ConsoleEnvironment())
{
var experiment = env.CreateExperiment();
var env = new MLContext();
Copy link
Member

@sfilipi sfilipi Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env [](start = 16, length = 3)

i think there is an environment on the BaseTestClass. #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not. Perhaps there ought to be one, but even if we upon reflecting decided it was a good idea, in any case not really part of this PR, really, since I'm not really trying to restructure the tests so much as not use ConsoleEnvironment everywhere anymore? #Resolved

@@ -58,7 +58,7 @@ private void CheckSchemaHasColumn(ISchema schema, string name, out int idx)
[Fact]
public void SimpleTextLoaderCopyColumnsTest()
{
var env = new ConsoleEnvironment(0, verbose: true);
var env = new MLContext(0);
Copy link
Member

@sfilipi sfilipi Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env [](start = 16, length = 3)

same, about re-using the env if there. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is again not there, but even if it was, my goal is not to restructure the tests. Now then, it may be I'd agree that restructuring some tests might be a great idea, but this is assuredly completely the wrong PR to do it in, wouldn't you say?

@@ -12,6 +12,7 @@
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.InferenceTesting" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.StaticPipelineTesting" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.OnnxTransformTest" + PublicKey.TestValue)]
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.BenchMarks" + PublicKey.TestValue)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this isn't right, is it? Our Benchmarks should be written from an end-user's perspective, and not depend on internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it since I think it helps improve the benchmark code to start using the more modern APIs, but I'm not sure you're right about this? Why could we not use the benchmark project to test individual components?

I'll give a specific example. In some other work currently in progress, I have written benchmarks against FastTree's IntArray, as I'm exploring the possibility that the sumup routine can be purely managed. Currently that is a public class, but assuredly it must be internal in the future. Will I be disallowed from being able to use benchmarks for this? If we want to have two separate benchmark projects -- one for user facing stuff, one for our internal components -- that might be fine, but having no story is not one I consider an acceptable outcome.


In reply to: 234246392 [](ancestors = 234246392)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, my opinion is that both functional and performance tests should work that way.

  • Unit Tests
    • they can have IVT the product, because they test units in isolation. and the best way to do that is to get at internal things.
  • End-to-End Customer Tests
    • these should not have IVT the product, and should be more like what a customer would do/see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a benchmark is not exclusively and end-to-end customer test. And benchmarking must not be only end-to-end stuff. We don't make a product with 1000 moving parts efficient by having end-to-end tests, because the reality is that any improvement we measure on one piece will be dwarfed by the other 999 pieces that were not improved. It is nonetheless crucial that we be aware of the perf of these small pieces, because only by doing this multiple times will we begin to see overall improvement in our strategy. Saying benchmarks may only be run on the highest level and not on internal components runs counter to this strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not have been clear in what I was saying above. I believe there are 4 types of tests:

  • Perf end-to-end
  • Perf unit
  • Functional end-to-end
  • Functional unit
Test Type Unit End to End
Functional Tests IVT Allowed No IVT
Performance Tests IVT Allowed No IVT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine, but what are you suggesting we do with the internal perf tests? Separate project?

Copy link
Member

@eerhardt eerhardt Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Same for "end to end" functional tests. They should be in a separate project from our unit tests. That way we can be assured that we are only calling what a customer can call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Obviously this isn't something we can do in this PR. But it would be a nice "medium-term goal".)

@eerhardt
Copy link
Member

eerhardt commented Nov 16, 2018

public abstract class BaseStacking<TOutput> : IStackingTrainer<TOutput>

Does BaseStacking still need to be public? It appears all its derived types are now internal. #Resolved


Refers to: src/Microsoft.ML.Ensemble/OutputCombiners/BaseStacking.cs:19 in 78deaa1. [](commit_id = 78deaa1, deletion_comment = False)

@@ -43,6 +43,7 @@ public sealed class Arguments : ArgumentsBase
[TGUI(Label = "Output combiner", Description = "Output combiner type")]
public ISupportRegressionOutputCombinerFactory OutputCombiner = new MedianFactory();

// TODO: If we make this public again it should be an *estimator* of this type of predictor, rather than the (deprecated) ITrainer.
Copy link
Member

@eerhardt eerhardt Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) what's the difference between TODO and REVIEW ? #Resolved

Copy link
Contributor Author

@TomFinley TomFinley Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REVIEW was something favored by someone once upon a time, whereas TODO is a more standard thing. (Indeed VS has tooling around TODO.) I'm dabbling in starting to use todo instead? #Resolved

Copy link
Member

@eerhardt eerhardt Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it would be good to have a single way of doing this in the code base.... Maybe an issue to move all REVIEW comments to TODO if we are going down this route? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Not doing that as part of this PR. I'd rather just not shift to using todo as I had if you insist I do it everywhere if I do it once.


In reply to: 234257534 [](ancestors = 234257534)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion above was to log an issue to make our code base consistent on 1 standard way of logging a "someone should do this" comment. I never insisted you do it everywhere in this PR.

@eerhardt
Copy link
Member

    [Obsolete(NeedEnvObsoleteMessage)]

Can we just delete these methods instead? They are marked Obsolete. We've been making plenty of other breaking changes. This one would be so minor that no one would care.


Refers to: src/Microsoft.ML.Api/TypedCursor.cs:561 in 0a40d88. [](commit_id = 0a40d88, deletion_comment = False)

/// console functionality.
/// </summary>
[BestFriend]
internal sealed class ConsoleEnvironment : HostEnvironmentBase<ConsoleEnvironment>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can close #1284 with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Added to the description.


In reply to: 234249735 [](ancestors = 234249735)

@@ -114,7 +114,7 @@ private static bool TryParseFile(IChannel ch, TextLoader.Arguments args, IMultiS
try
{
// No need to provide information from unsuccessful loader, so we create temporary environment and get information from it in case of success
using (var loaderEnv = new ConsoleEnvironment(0, true))
using (var loaderEnv = new ConsoleEnvironment(0, verbose: true))
Copy link
Member

@eerhardt eerhardt Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we still using ConsoleEnvironment here, but removing it from other places? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .AddListener below. While I've tried to limit the usage of console environment, things that act directly on the host environment base and its own mechanisms will still continue to need to use either this, local environment, or some other host environment base derived class, which MLContext (deliberately) is not.


In reply to: 234251385 [](ancestors = 234251385)

@eerhardt
Copy link
Member

CodeFlow for GitHub does not download executable files. If you trust this file and would like to run it, please download it yourself through GitHub.

Why are we deleting this file?


Refers to: test/BaselineOutput/Common/Onnx/Cluster/BreastCancer/Kmeans.onnx:1 in 0a40d88. [](commit_id = 0a40d88, deletion_comment = True)

using System.Reflection;

namespace Microsoft.ML.Runtime
{
[BestFriend]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is the right change for AssemblyLoadingUtils. I purposefully made it as "shared source" between Maml and the ML.Legacy assemblies, with the hopes that ML.Legacy will just go away, so this will only be used in Maml.

I don't think it should be in the Core assembly (and especially not with a [BestFriend] attribute). I don't want people calling this code at all. Maml and ML.Legacy need it to keep things working, but I don't want any other places calling it. I've instructed other internal projects, if they need this code, they can copy it and modify it to meet their needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, people have been ignoring your instructions. 😄 A couple more now use it.

I made ICommand and all ICommand implementors internal. Things that directly call some of the commands now had to reference them. In the case of Maml this was Legacy and ResultProcessor, both of which actually use this AssemblyLoadingUtils. So I made Maml consider Legacy and ResultProcessor friends of Maml, so that they could keep using it. But of course this results in an error, because they're

However, because you made this a source reference, I then had assemblies referencing each other, which led to name collisions resulting in compiler errors. This was my attempt to untangle the mess. I can't just go back to the way things were because that doesn't compile. I didn't perform this move in a fit of whimsy.

It may be that eventually we can winnow it down, but for right now, I'm not sure what else is better to do.

I'm not sure what you mean about "especially not." Just because something is a best-friend internal doesn't mean we're going to keep it now and forever. Indeed the point of introducing this infrastructure in the first place is so we would be free to later whip the internal infrastructure code into a better form, without compromising v1.0. This I think falls pretty squarely into its intended use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean about "especially not." Just because something is a best-friend internal doesn't mean we're going to keep it now and forever.

It means more places will feel empowered to call this method, because they won't get a "Don't use this" error. And as more places use it, the harder it will be to remove in the future. (Especially places that aren't in this repo, that are planning on getting IVT the Core assembly)

Can we instead do some other "tricks" to make it not part of the Core assembly? Maybe using extern alias? https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/extern-alias

Or maybe we have compiler directives that change the namespace of this file/class in the different assemblies it is used in?

I'd really like this type not to be in the Core assembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'll mark it obsolete as I did with IHostEnvironment.GetTempFile. That will give people a nice little message discouraging them, without requiring that I introduce special build infrastructure just for this one little file, which frankly I don't want to do.


In reply to: 234269562 [](ancestors = 234269562)

/// </summary>
public sealed class GenerateSweepCandidatesCommand : ICommand
internal sealed class GenerateSweepCandidatesCommand : ICommand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved this from test to src. Is that expected? This command was only needed in tests before....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The movement was intentional. Certainly this command was not written to reside in tests. In point of fact we would say that strictly speaking ML.NET from an API perspective does not rely on any command. Nonetheless I'd very their movement into tests as a strong negative, unless it were a command that is not actually intended to be user-facing in our internal command line tools. (Which, this one and the others I moved are intended to be used there.)

I have absolutely no idea why these and the other two commands I moved were in tests. They certainly don't belong there, they are part of these projects.


/// <summary>
/// A simplified assembly attribute for marking EntryPoint modules.
/// </summary>
[AttributeUsage(AttributeTargets.Assembly, AllowMultiple = true)]
public sealed class EntryPointModuleAttribute : LoadableClassAttributeBase
[BestFriend]
internal sealed class EntryPointModuleAttribute : LoadableClassAttributeBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the EntryPoint related methods on ComponentCatalog internal as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assuredly. Do you feel this PR needs to be longer though? I'm trying to break up the internalization work into multiple phases (hence the numbering, phase 1, phase 2, and so forth). While surely we can just keep going forever, I think it would be more helpful if we tried to iterate quickly, rather than insisting we internalize everything in one gigantic PR. So just pointing out "you could have internalized more," while I'm sure well intentioned, is something I already kind of know and understand?


In reply to: 234256509 [](ancestors = 234256509)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main issue at this moment is the AssemblyLoadingUtils. I don't believe putting that code (even internally) into the Core assembly is the right thing to do.

@TomFinley
Copy link
Contributor Author

    [Obsolete(NeedEnvObsoleteMessage)]

Yeah, it's been a few years, why not.


In reply to: 439432814 [](ancestors = 439432814)


Refers to: src/Microsoft.ML.Api/TypedCursor.cs:561 in 0a40d88. [](commit_id = 0a40d88, deletion_comment = False)

@TomFinley
Copy link
Contributor Author

CodeFlow for GitHub does not download executable files. If you trust this file and would like to run it, please download it yourself through GitHub.

Because its inclusion was accidental. We don't include in BaselineOutput something unless it's being (at the risk of being tautological) baselined, and binary formats definitely aren't being baselined. I just happened to see it and opportunistically deleted it, while I was updating the other file in this directory.


In reply to: 439435465 [](ancestors = 439435465)


Refers to: test/BaselineOutput/Common/Onnx/Cluster/BreastCancer/Kmeans.onnx:1 in 0a40d88. [](commit_id = 0a40d88, deletion_comment = True)

@TomFinley
Copy link
Contributor Author

public abstract class BaseStacking<TOutput> : IStackingTrainer<TOutput>

I certainly don't see the harm in making this internal, sure. However, I feel like the strategy that I've been employing -- starting at the core and working my way out and making what must be internal or otherwise hidden in leaf assemblies as a result of that change -- is the best strategy, so starting to focus on a leaf assembly like this would be, I think, not the best strategy. If it's all the same to you, maybe I'll continue with that strategy?


In reply to: 439430752 [](ancestors = 439430752)


Refers to: src/Microsoft.ML.Ensemble/OutputCombiners/BaseStacking.cs:19 in 78deaa1. [](commit_id = 78deaa1, deletion_comment = False)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

s.Separator = ",";
});

var estimatorPipeline = ml.Transforms.Categorical.OneHotEncoding("CatFeatures")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that we are using the "real" API now. Thank you for doing it.

@TomFinley TomFinley merged commit d885ebc into dotnet:master Nov 16, 2018
@TomFinley TomFinley deleted the tfinley/Trainer branch November 16, 2018 21:46
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ConsoleEnvironment internal
3 participants