Closed
Description
Issue
Discovered from #3037, a user can call Concatenate
and specify a single string. When this happens, a NullReference exception is thrown. Here is the code sample:
EstimatorChain()
.Append(mlContext.Transforms.Conversion.ConvertType("Features", "Price", DataKind.Double))
.Append(mlContext.Transforms.Conversion.ConvertType("Label", "Area", DataKind.Double))
.Append(mlContext.Transforms.Concatenate("Features")) // This causes the error, should be ("Features", "Features")
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.Regression.Trainers.Sdca("Label", "Features"))
, mlContext
Here is the callstack:
> Microsoft.ML.Core.dll!Microsoft.ML.SchemaShape.Column.GetTypeString() Line 111 C#
Microsoft.ML.Data.dll!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>, Microsoft.ML.Trainers.LinearRegressionModelParameters>.CheckInputSchema(Microsoft.ML.SchemaShape inputSchema) Line 111 C#
Microsoft.ML.Data.dll!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>, Microsoft.ML.Trainers.LinearRegressionModelParameters>.GetOutputSchema(Microsoft.ML.SchemaShape inputSchema) Line 83 C#
Microsoft.ML.Data.dll!Microsoft.ML.Data.EstimatorChain<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>>.GetOutputSchema(Microsoft.ML.SchemaShape inputSchema) Line 83 C#
Microsoft.ML.Data.dll!Microsoft.ML.Data.EstimatorChain<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>>.Fit(Microsoft.ML.IDataView input) Line 60 C#
ConsoleApp32.dll!Program.main(string[] argv) Line 33 F#
The problem is that a NullReference exception looks like a bug and its not obvious to the user on what is the cause of the problem.
Expected
We should instead notify the user that:
- A bad argument was passed in
- That its the Concatenate transform that has the bad argument
Solution A
We simply check the length of the name array that is passed to Concatenate and throw the correct exception.
Solution B
Another possible solution is to change the behavior so that when one column is specified for Concatenate
, the name is treated as the source and destination -- so this:
.Append(mlContext.Transforms.Concatenate("Features"))
would be the same as this:
.Append(mlContext.Transforms.Concatenate("Features", "Features"))
cc @glebuk for additional feedback