Description
System Information:
- OS & Version: Windows 11 [Version 10.0.22621.819]
- ML.NET Version: ML.NET v2.0.0
- .NET Version: .NET 7.0.100
Describe the bug
I run an ML.NET experiment for 10 hours. I limited the training for FastTree and FastForest algorithms and I made sure that every new best model is saved even if the whole run fails. I locked Windows and let it run during the night.
When I checked it in the morning (12 hours after the experiment started) I couldn't even log in, I had to do a power off-power on to get access. When I checked the logs it looked like the last experiment was saved after ~5 hours of running.
The dataset wasn't particularly big this time, it was only a 205 MB CSV file. My system has 32 GB RAM, and I set the maximum memory usage for the experiment to 20 GB.
To Reproduce
Steps to reproduce the behavior:
- Create a pipeline that includes only FastTree and FastForest, for example:
var pipeline =
mlContext.Auto().Featurizer(trainTestData.TrainSet, numericColumns: new[] { "Features" })
.Append(mlContext.Auto().Regression(useFastTree: true, useLbfgs: false, useSdca: false, useFastForest: true, useLgbm: false));
- Create an ML.NET experiment and set its memory limit. I set it to 20 GB, because I usually have ~24 GB available when I start the training.
var experiment = mlContext.Auto().CreateExperiment();
experiment
.SetPipeline(pipeline)
.SetTrainingTimeInSeconds(trainingTimeInSeconds)
.SetRegressionMetric(RegressionMetric.RSquared, labelColumn: "Label")
.SetDataset(trainTestData.TrainSet, trainTestData.TestSet)
.SetMonitor(monitor)
.SetMaximumMemoryUsageInMegaByte(20 * 1024);
- Run the experiment for at least 10 hours
- Check if you can still use the system after 5+ hours of running
Expected behavior
I would expect the experiment to run without a system-freezing halt.
Screenshots, Code, Sample Projects
The 205 MB CSV dataset can be downloaded from here:
BBD_20221122__TrainingData.Sleep.MLP12__MLP12_0p25Hz-250Hz__Session_SegmentedData_Sleep_Level__15782rows.zip
Additional context
My project is open-source so both the source and the data are available. Let me know if you need them for testing.