OnlineGradientDescent throws exception

### System information
- Microsoft Windows Pro version 10.0.17763, 64GB RAM, I7-7700K 4 physical cores 4.2 GHz, 2x 250 GB M2 Drives, AMD FirePro W5100 with 4096 MB/930 MHz
- .Net Version 4.72, Microsoft.ML 0.9.0 Wednesday, January 9, 2019 (1/9/2019) 
- Dataset 3,378,393 rows

### Issue
What did I do
- Comparing the prediction accuracy using
1. same data source 
2. same normalisation 
3. with different trainers

I configured the estimator chain like so:
```
var dataProcessPipeline = mlContext.Transforms.CopyColumns("predictField", "Label")
.Append(mlContext.Transforms.Normalize(inputName: "SH1", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
.Append(mlContext.Transforms.Normalize(inputName: "SL1", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
… 665 more
.Append(mlContext.Transforms.Normalize(inputName: "SH9", mode: NormalizingEstimator.NormalizerMode.MeanVariance))
.Append(mlContext.Transforms.Concatenate("Features","SH1",..."SH9"));
dataProcessPipeline.AppendCacheCheckpoint(mlContext);
```
Previously I had 119 data points in the model and had no error.

I test the models based on the parameter telling it what network to learn, the item causing the error is this

```
else if (Definition.MachineLearningMethod == AI.ML.Factory.MachineLearningMethods.OnlineGradientDescent)
                    {
                        var trainer = mlContext.Regression.Trainers.OnlineGradientDescent(labelColumn: "Label"
                                                                                        , featureColumn: "Features"
                                                                                        , advancedSettings: a =>
                                                                                        {                                                                                            
                                                                                            a.DecreaseLearningRate = true;
                                                                                            a.DoLazyUpdates = true;
                                                                                            a.NormalizeFeatures = NormalizeOption.Yes;                                                                                           
                                                                                            a.DecreaseLearningRate = true;
                                                                                            a.Caching = Microsoft.ML.EntryPoints.CachingOptions.Memory;                                                                                            
                                                                                        }
                                                                                        );
                       var trainingPipeline = dataProcessPipeline.Append(trainer);
                       return trainingPipeline.Fit(trainingDataView);
```




- **What happened?**
After I call Fit on my Training Data view I see following errors
Exception thrown: 'System.InvalidOperationException' in Microsoft.ML.StandardLearners.dll
then
Exception OnlineGradientDescent:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc

after, I think, the .net framework throws an error in my running  test (no debugger attached)
> Managed Debugging Assistant 'ContextSwitchDeadlock' 
> The CLR has been unable to transition from COM context 0x248b5058 to COM context 0x248b5180 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.





- **What did you expect?**
Having been able to run the network without any of the advanced using a smaller dataset and receiving the error I added the Advanced settings hoping to be able to solve the issue. this however is not the case. 

### Source code / logs
:

_[Source=NormalizingEstimator; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.5139276.
[Source=NormalizingEstimator; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.4765765.
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel finished. Elapsed 00:04:53.4197884.
[Source=ColumnConcatenatingEstimator ; RowToRowMapperTransform; Cursor, Kind=Trace] Channel disposed
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] 2/4/2019 2:59:47 PM Finished training iteration 1; iterated over 3412517 examples.
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] Channel finished. Elapsed 00:04:56.6368673.
[Source=Stochastic Gradient Descent (Regression); Training, Kind=Trace] Channel disposed

Exception OnlineGradientDescent:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
Exception:The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
testhost.exe Error: 0 : The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc._


full log is attached
[Learning exception.zip](https://github.com/dotnet/machinelearning/files/2829739/Learning.exception.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OnlineGradientDescent throws exception #2407

System information

Issue

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OnlineGradientDescent throws exception #2407

Description

System information

Issue

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions