Skip to content

OnlineGradientDescent throws exception #2407

Closed
@PeterPann23

Description

@PeterPann23

System information

  • Microsoft Windows Pro version 10.0.17763, 64GB RAM, I7-7700K 4 physical cores 4.2 GHz, 2x 250 GB M2 Drives, AMD FirePro W5100 with 4096 MB/930 MHz
  • .Net Version 4.72, Microsoft.ML 0.9.0 Wednesday, January 9, 2019 (1/9/2019)
  • Dataset 3,378,393 rows

Issue

What did I do

  • Comparing the prediction accuracy using
  1. same data source
  2. same normalisation
  3. with different trainers

I configured the estimator chain like so:

var dataProcessPipeline = mlContext.Transforms.CopyColumns("predictField", "Label")
.Append(mlContext.Transforms.Normalize(inputName: "SH1", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
.Append(mlContext.Transforms.Normalize(inputName: "SL1", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
… 665 more
.Append(mlContext.Transforms.Normalize(inputName: "SH9", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
.Append(mlContext.Transforms.Concatenate("Features","SH1",..."SH9"));
dataProcessPipeline.AppendCacheCheckpoint(mlContext);

Previously I had 119 data points in the model and had no error.

I test the models based on the parameter telling it what network to learn, the item causing the error is this

else if (Definition.MachineLearningMethod == AI.ML.Factory.MachineLearningMethods.OnlineGradientDescent)
                    {
                        var trainer = mlContext.Regression.Trainers.OnlineGradientDescent(labelColumn: "Label"
                                                                                        , featureColumn: "Features"
                                                                                        , advancedSettings: a =>
                                                                                        {                                                                                            
                                                                                            a.DecreaseLearningRate = true;
                                                                                            a.DoLazyUpdates = true;
                                                                                            a.NormalizeFeatures = NormalizeOption.Yes;                                                                                           
                                                                                            a.DecreaseLearningRate = true;
                                                                                            a.Caching = Microsoft.ML.EntryPoints.CachingOptions.Memory;                                                                                            
                                                                                        }
                                                                                        );
                       var trainingPipeline = dataProcessPipeline.Append(trainer);
                       return trainingPipeline.Fit(trainingDataView);
  • What happened?
    After I call Fit on my Training Data view I see following errors
    Exception thrown: 'System.InvalidOperationException' in Microsoft.ML.StandardLearners.dll
    then
    Exception OnlineGradientDescent:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc

after, I think, the .net framework throws an error in my running test (no debugger attached)

Managed Debugging Assistant 'ContextSwitchDeadlock'
The CLR has been unable to transition from COM context 0x248b5058 to COM context 0x248b5180 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.

  • What did you expect?
    Having been able to run the network without any of the advanced using a smaller dataset and receiving the error I added the Advanced settings hoping to be able to solve the issue. this however is not the case.

Source code / logs

:

_[Source=NormalizingEstimator; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.5139276.
[Source=NormalizingEstimator; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.4765765.
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.4197884.
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] 2/4/2019 2:59:47 PM Finished training iteration 1; iterated over 3412517 examples.
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] Channel finished. Elapsed 00:04:56.6368673.
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] Channel disposed

Exception OnlineGradientDescent:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
Exception:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
testhost.exe Error: 0 : The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc._

full log is attached
Learning exception.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    need infoThis issue needs more info before triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions