Skip to content

Automatic conversion of data in the pipeline #3060

Closed
@singlis

Description

@singlis

Issue

When providing data to a pipeline there is an expectation that we put on the user to know the data type the trainer is expecting. This is a painful experience for end-users as it requires them to not only know what data types they need to convert to, but also results in them having to add more steps to their pipeline to accommodate.

The example from #3037 demonstrates this issue as this pipeline is taking in integer values for the Label and Features and passing this into the SDCA trainer. Because the data is integer based, the pipeline uses ConvertType to convert from int to float, followed by a Concatenate to generate a vector type (note this is in F# but still applies to C#)

        let mlContext = MLContext()
        EstimatorChain()
           .Append(mlContext.Transforms.Conversion.ConvertType("Features", "Price", DataKind.Single))
           .Append(mlContext.Transforms.Conversion.ConvertType("Label", "Area", DataKind.Single))
           .Append(mlContext.Transforms.Concatenate("Features", "Features"))
           .AppendCacheCheckpoint(mlContext)
           .Append(mlContext.Regression.Trainers.StochasticDualCoordinateAscent("Label", "Features"))
           , mlContext

Without conversions, the user will hit an exception saying that the expected type for a Label is of type float followed by the expected type for Features should be a vector of floats.

Suggestion

We should hide these details from the user as this would make the pipeline easier to load and simplify a user's pipeline. Taking the example above, if you were to remove the conversion steps, it would look something like this:

   let trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent("Area", "Price")

cc @glebuk for any additional input

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions