Skip to content

SymSGD IndexOutOfRangeException #3887

Closed
@justinormont

Description

@justinormont

I get an error when using OVA-SymSGD on an internal dataset. Other learners, like SDCA and OVA-AveragedPerceptron work successfully (though LightGBM dies due to #1625).

Error:

Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Microsoft.ML.Trainers.SymbolicSgdLogisticRegressionBinaryTrainer.Native.LearnAll(InputDataManager inputDataManager, Boolean tuneLR, Single& lr, Single l2Const, Single piw, Span`1 weightVector, Single& bias, Int32 numFeatres, Int32 numPasses, Int32 numThreads, Boolean tuneNumLocIter, Int32& numLocIter, Single tolerance, Boolean needShuffle, Boolean shouldInitialize, GCHandle stateGCHandle, ChannelCallBack info)
   at Microsoft.ML.Trainers.SymbolicSgdLogisticRegressionBinaryTrainer.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
   at Microsoft.ML.Trainers.SymbolicSgdLogisticRegressionBinaryTrainer.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.TrainOne(IChannel ch, ITrainerEstimator`2 trainer, RoleMappedData data, Int32 cls)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger)

Pipeline

Below is the same pipeline but using SDCA, which runs successfully.

var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("label_col", "label_col")
                  .Append(mlContext.Transforms.Categorical.OneHotEncoding(new[] { new InputOutputColumnPair("col1", "col1"), new InputOutputColumnPair("col2", "col2"), new InputOutputColumnPair("col3", "col3"), new InputOutputColumnPair("col4", "col4"), new InputOutputColumnPair("col5", "col5") }))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col6_tf", "col6"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col7_tf", "col7"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col8_tf", "col8"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col9_tf", "col9"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col10_tf", "col10"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col11_tf", "col11"))
                  .Append(mlContext.Transforms.Text.FeaturizeText("col12_tf", "col12"))
                  .Append(mlContext.Transforms.Concatenate("Features", new[] { "col1", "col2", "col3", "col4", "col5", "col6_tf", "col7_tf", "col8_tf", "col9_tf", "col10_tf", "col11_tf", "col12_tf", "col13", "col14", "col15", "col16", "col17", "col18", "col19", "col20", "col21" }))
                  .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
                  .AppendCacheCheckpoint(mlContext);

            // Set the training algorithm 
            var trainer = mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: "label_col", featureColumnName: "Features")
                  .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
            var trainingPipeline = dataProcessPipeline.Append(trainer);

Metadata

Metadata

Assignees

Labels

P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions