Skip to content

KeyToVectorMappingTransformer: Index was outside the bounds of the array. #3757

Closed

Description

System information

  • OS version/distro: 1.0.0
  • .NET Version (eg., dotnet --info): 4.6.2

Issue

  • What did you do? Apply Normalization superviseBin and OneHot,
  • What happened? Got "Index was outside the bounds of the array."
  • What did you expect? No error

Source code / logs

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

            MLContext mlContext = new MLContext(seed: 0);
            var dataPath = "test1.csv";
            var featureName = "Features";
            var loader = mlContext.Data.CreateTextLoader(new[] 
            {
                new TextLoader.Column("int1", DataKind.Int64, 0),
                new TextLoader.Column("int2", DataKind.Int64, 1),
                new TextLoader.Column("Label", DataKind.Boolean, 2),
            }, hasHeader: true, separatorChar: ',');

            var data = loader.Load(dataPath);
            var learningPipeline = mlContext.Transforms.Conversion.ConvertType("int1", outputKind: DataKind.Single)
                    .Append(mlContext.Transforms.Conversion.ConvertType("int2", outputKind: DataKind.Single))
                    .Append(mlContext.Transforms.Concatenate(featureName, new string[] { "int1", "int2" }))
                    .Append(mlContext.Transforms.NormalizeSupervisedBinning(featureName, fixZero: false, maximumBinCount: 5, labelColumnName: "Label"))
                    .Append(mlContext.Transforms.Categorical.OneHotEncoding(featureName, outputKind: OneHotEncodingEstimator.OutputKind.Indicator));
            learningPipeline.Fit(data).Transform(data).Preview();
int1, int2, label
301, 2000, true
450, 3000, true
-300, 4000, true
300, 2000, false
115, 2000, false
115, 2000, false

I think it is related to issue 1751 And based on the discussion for this issue, I tried

  • adding MapKeyToValue() but no help.
  • OneHotEncode cannot be easily removed, we want to treat binning as categorical feature.
            MLContext mlContext = new MLContext(seed: 0);
            var dataPath = "test1.csv";
            var featureName = "Features";
            var loader = mlContext.Data.CreateTextLoader(new[] 
            {
                new TextLoader.Column("int1", DataKind.Int64, 0),
                new TextLoader.Column("int2", DataKind.Int64, 1),
                new TextLoader.Column("Label", DataKind.Boolean, 2),
            }, hasHeader: true, separatorChar: ',');

            var data = loader.Load(dataPath);
            var learningPipeline = mlContext.Transforms.Conversion.ConvertType("int1", outputKind: DataKind.Single)
                    .Append(mlContext.Transforms.Conversion.ConvertType("int2", outputKind: DataKind.Single))
                    .Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
                    .Append(mlContext.Transforms.Concatenate(featureName, new string[] { "int1", "int2" }))
                    .Append(mlContext.Transforms.NormalizeSupervisedBinning(featureName, fixZero: false, maximumBinCount: 5, labelColumnName: "Label"))
                    .Append(mlContext.Transforms.Categorical.OneHotEncoding(featureName, outputKind: OneHotEncodingEstimator.OutputKind.Indicator))
                    .Append(mlContext.Transforms.Conversion.MapKeyToValue("Label"));
            learningPipeline.Fit(data).Transform(data).Preview();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions