-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Automatic conversion of data in the pipeline #3060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is not I think correct. Brevity and simplicity are not the same thing. An API that is predictable is an API that has the "right" simplicity, and the best way to be predictable is to do what the user tells us to do. An API with loads of implicit "helpful" behavior is an API that is ultimately impossible to predict in any meaningful fashion, and so impossible to understand. We have a long history of people trying to add helpful, "harmless" operations for the user -- whether it's type conversion, auto-caching, auto-normalization, auto-calibration. I even had to write a section about it. It of course ultimately blows up in our faces, again and again. I even wrote a section on type conversion specifically as being especially insidious. Close? |
You can solve this by having the trainer itself take in data directly rather than a data view. Then you can type the method signature to be clear what the input for label and features need to be. |
Closing as per @TomFinley 's explanation |
Issue
When providing data to a pipeline there is an expectation that we put on the user to know the data type the trainer is expecting. This is a painful experience for end-users as it requires them to not only know what data types they need to convert to, but also results in them having to add more steps to their pipeline to accommodate.
The example from #3037 demonstrates this issue as this pipeline is taking in integer values for the Label and Features and passing this into the SDCA trainer. Because the data is integer based, the pipeline uses
ConvertType
to convert from int to float, followed by aConcatenate
to generate a vector type (note this is in F# but still applies to C#)Without conversions, the user will hit an exception saying that the expected type for a Label is of type float followed by the expected type for Features should be a vector of floats.
Suggestion
We should hide these details from the user as this would make the pipeline easier to load and simplify a user's pipeline. Taking the example above, if you were to remove the conversion steps, it would look something like this:
cc @glebuk for any additional input
The text was updated successfully, but these errors were encountered: