Closed
Description
openedon May 21, 2019
System information
- OS version/distro: 1.0.0
- .NET Version (eg., dotnet --info): 4.6.2
Issue
- What did you do? Apply Normalization superviseBin and OneHot,
- What happened? Got "Index was outside the bounds of the array."
- What did you expect? No error
Source code / logs
Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.
MLContext mlContext = new MLContext(seed: 0);
var dataPath = "test1.csv";
var featureName = "Features";
var loader = mlContext.Data.CreateTextLoader(new[]
{
new TextLoader.Column("int1", DataKind.Int64, 0),
new TextLoader.Column("int2", DataKind.Int64, 1),
new TextLoader.Column("Label", DataKind.Boolean, 2),
}, hasHeader: true, separatorChar: ',');
var data = loader.Load(dataPath);
var learningPipeline = mlContext.Transforms.Conversion.ConvertType("int1", outputKind: DataKind.Single)
.Append(mlContext.Transforms.Conversion.ConvertType("int2", outputKind: DataKind.Single))
.Append(mlContext.Transforms.Concatenate(featureName, new string[] { "int1", "int2" }))
.Append(mlContext.Transforms.NormalizeSupervisedBinning(featureName, fixZero: false, maximumBinCount: 5, labelColumnName: "Label"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding(featureName, outputKind: OneHotEncodingEstimator.OutputKind.Indicator));
learningPipeline.Fit(data).Transform(data).Preview();
int1, int2, label
301, 2000, true
450, 3000, true
-300, 4000, true
300, 2000, false
115, 2000, false
115, 2000, false
I think it is related to issue 1751 And based on the discussion for this issue, I tried
- adding MapKeyToValue() but no help.
- OneHotEncode cannot be easily removed, we want to treat binning as categorical feature.
MLContext mlContext = new MLContext(seed: 0);
var dataPath = "test1.csv";
var featureName = "Features";
var loader = mlContext.Data.CreateTextLoader(new[]
{
new TextLoader.Column("int1", DataKind.Int64, 0),
new TextLoader.Column("int2", DataKind.Int64, 1),
new TextLoader.Column("Label", DataKind.Boolean, 2),
}, hasHeader: true, separatorChar: ',');
var data = loader.Load(dataPath);
var learningPipeline = mlContext.Transforms.Conversion.ConvertType("int1", outputKind: DataKind.Single)
.Append(mlContext.Transforms.Conversion.ConvertType("int2", outputKind: DataKind.Single))
.Append(mlContext.Transforms.Conversion.MapValueToKey("Label"))
.Append(mlContext.Transforms.Concatenate(featureName, new string[] { "int1", "int2" }))
.Append(mlContext.Transforms.NormalizeSupervisedBinning(featureName, fixZero: false, maximumBinCount: 5, labelColumnName: "Label"))
.Append(mlContext.Transforms.Categorical.OneHotEncoding(featureName, outputKind: OneHotEncodingEstimator.OutputKind.Indicator))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("Label"));
learningPipeline.Fit(data).Transform(data).Preview();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment