Skip to content

Null Reference Exception when Concatenating with a single value #3061

Closed
@singlis

Description

@singlis

Issue

Discovered from #3037, a user can call Concatenate and specify a single string. When this happens, a NullReference exception is thrown. Here is the code sample:

        EstimatorChain()
            .Append(mlContext.Transforms.Conversion.ConvertType("Features", "Price", DataKind.Double))
            .Append(mlContext.Transforms.Conversion.ConvertType("Label", "Area", DataKind.Double))
            .Append(mlContext.Transforms.Concatenate("Features"))  // This causes the error, should be ("Features", "Features")
            .AppendCacheCheckpoint(mlContext)
            .Append(mlContext.Regression.Trainers.Sdca("Label", "Features"))
            , mlContext

Here is the callstack:

>	Microsoft.ML.Core.dll!Microsoft.ML.SchemaShape.Column.GetTypeString() Line 111	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>, Microsoft.ML.Trainers.LinearRegressionModelParameters>.CheckInputSchema(Microsoft.ML.SchemaShape inputSchema) Line 111	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>, Microsoft.ML.Trainers.LinearRegressionModelParameters>.GetOutputSchema(Microsoft.ML.SchemaShape inputSchema) Line 83	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Data.EstimatorChain<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>>.GetOutputSchema(Microsoft.ML.SchemaShape inputSchema) Line 83	C#
 	Microsoft.ML.Data.dll!Microsoft.ML.Data.EstimatorChain<Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>>.Fit(Microsoft.ML.IDataView input) Line 60	C#
 	ConsoleApp32.dll!Program.main(string[] argv) Line 33	F#

The problem is that a NullReference exception looks like a bug and its not obvious to the user on what is the cause of the problem.

Expected

We should instead notify the user that:

  1. A bad argument was passed in
  2. That its the Concatenate transform that has the bad argument

Solution A

We simply check the length of the name array that is passed to Concatenate and throw the correct exception.

Solution B

Another possible solution is to change the behavior so that when one column is specified for Concatenate, the name is treated as the source and destination -- so this:

            .Append(mlContext.Transforms.Concatenate("Features"))

would be the same as this:

            .Append(mlContext.Transforms.Concatenate("Features", "Features"))

cc @glebuk for additional feedback

Metadata

Metadata

Assignees

Labels

P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions