Skip to content

Let ML.NET cancel unmanaged code. #6465

Open
@andrasfuchs

Description

@andrasfuchs

System Information:

  • OS & Version: Windows 11 [Version 10.0.22621.819]
  • ML.NET Version: ML.NET v2.0.0
  • .NET Version: .NET 7.0.100

Describe the bug
I run an ML.NET experiment for 10 hours. I limited the training for FastTree and FastForest algorithms and I made sure that every new best model is saved even if the whole run fails. I locked Windows and let it run during the night.
When I checked it in the morning (12 hours after the experiment started) I couldn't even log in, I had to do a power off-power on to get access. When I checked the logs it looked like the last experiment was saved after ~5 hours of running.
The dataset wasn't particularly big this time, it was only a 205 MB CSV file. My system has 32 GB RAM, and I set the maximum memory usage for the experiment to 20 GB.

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline that includes only FastTree and FastForest, for example:
var pipeline =
  mlContext.Auto().Featurizer(trainTestData.TrainSet, numericColumns: new[] { "Features" })
      .Append(mlContext.Auto().Regression(useFastTree: true, useLbfgs: false, useSdca: false, useFastForest: true, useLgbm: false));

  1. Create an ML.NET experiment and set its memory limit. I set it to 20 GB, because I usually have ~24 GB available when I start the training.
var experiment = mlContext.Auto().CreateExperiment();

experiment
    .SetPipeline(pipeline)
    .SetTrainingTimeInSeconds(trainingTimeInSeconds)
    .SetRegressionMetric(RegressionMetric.RSquared, labelColumn: "Label")
    .SetDataset(trainTestData.TrainSet, trainTestData.TestSet)
    .SetMonitor(monitor)
    .SetMaximumMemoryUsageInMegaByte(20 * 1024);
  1. Run the experiment for at least 10 hours
  2. Check if you can still use the system after 5+ hours of running

Expected behavior
I would expect the experiment to run without a system-freezing halt.

Screenshots, Code, Sample Projects
The 205 MB CSV dataset can be downloaded from here:
BBD_20221122__TrainingData.Sleep.MLP12__MLP12_0p25Hz-250Hz__Session_SegmentedData_Sleep_Level__15782rows.zip

Additional context
My project is open-source so both the source and the data are available. Let me know if you need them for testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning processenhancementNew feature or requestlightgbmBugs related lightgbm

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions