-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Predict expects the Label as input #3063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Presumably only if you have transformers that are consuming In fact @Zruty0 even devised a convenience for this case. If you have a transformer, you would when building the pipeline the This however is not documented. The two key methods here, have absolutely no XML documentation. Our samples have usage of
Of course, now that we're approaching stabilization of the API, we will I suppose work on improving the documentation. This is obviously something that must be done in general, but this is an important enough situation that it definitely warrants special consideration, its own section in documentation no doubt. Assuming I understand the issue correctly, maybe we can change this from being enhancement to documentation. But I'll leave that to you @singlis; possibly I've misunderstood the point. |
Thanks @TomFinley - I agree with you that documentation would help in addition to having its own section. I created this issue #3098 to help track documentation efforts, but left this issue to discuss further. While that solution works, the user would need to have additional knowledge of our API to know what they are doing -- it is something that would not come naturally. We are setting a mask (scope filter) on each transformer that is appended in the chain. This mask defaults to Documentation would definitely help - but I do wonder how often this feature would be used rather than using the same class with a default value for Label. Could we determine what are the transforms that generate the Label column and filter them out for the user? Or are there other scenarios that would be broken from a feature like this? |
I look at it very differently. We have a very clear, explicit chain of being told what to do: we have deliberately added this thing to a chain when we didn't have to, then we've stated that we wanted a prediction engine over this chain. I have provided what I think are two very reasonable ways of resolving this "problem."
You're proposing a third option, but I'm less excited about that one.
I've never actually seen an attempt in programming languages or APIs to "do what I mean, not what I said" work -- you can kind of get away with it in tools where the user can provide immediate feedback, but, when doing actual software development, I haven't seen that work. Well intentioned attempts to do so have consistently failed in ML.NET to the point where we eventually just threw up our hands and said, never again. Making the code comprehensible and readable becomes really, really hard when there's a lot of "magic" happening to user's data they can't see. Code that by first basic principles shouldn't work suddenly starting to work doesn't mean you've made things easier for a user; far from it. I could potentially see a suggestion from the Roslyn code analyzer under appropriate circumstances, but that's about the extent of it. The message this hypothetical Roslyn analyzer might make is: "It looks like you're transforming a label column. Would you like help?" (Yes, I stole your joke Scott. :) ) But I can see many ways in which such a system would be imperfect at best. |
@colbylwilliams and myself also ran into this issue since it is confusing that a prediction would require a label. For example, I am using the below code to do a batch prediction - if the SearchResultData does not contain a label, you get an exception complaining that the Label is missing : System.ArgumentOutOfRangeException: 'Could not find input column 'Label' I wouldn't expect the Label column to be required since this should only be needed for training\evaluating the data in supervised learning scenarios. Requiring it is confusing because the Label value is what is being predicted. Example code: // Load test data and use the model to perform predictions on it.
@CESARDELATORRE - interested to know your thoughts on this. |
Issue
When calling Predict, our Predict method will take in the same input as what is used for the training pipeline. This is a bit "odd" as we force the user to define a "Label" variable that does nothing nor is it needed for the output.
Using the example from #3037, we have something like this:
Where Area is our "Label", because this is required by the pipeline, we have to add this in as part of the input.
Could our pipeline change to only consume the data that is needed to do the prediction? And ideally have something like this:
cc @glebuk for any additional comments.
The text was updated successfully, but these errors were encountered: