Skip to content

OOM errors in FastTree #6175

Closed
Closed
@torronen

Description

@torronen

System Information (please complete the following information):

  • OS & Version: Windows 11
  • ML.NET Version: ML.NET v1.5.5
  • .NET Version: NET6.0

Describe the bug
Out-of-memory errors on FastTree. There is still virtual (paging) memory available, but RAM is full. Maybe there is something that could be done to more effectively use virtual memory? Strangely, one of the 128 Gb Ryzens is running FastTree on this dataset, while 4 similar failed with various OOM errors so I am able to run the training, just a bit slower.

Dataset: 112 Gb IDV file, 369 Gb CSV file
RAM: 128 gb
It is one file, with sampling key.

The one working has about 145 Gb virtual memory, system managed. Other's have 500 - 1000 gb fixed sized files in Windows advanced settings.

To Reproduce
Steps to reproduce the behavior:

  1. Create big dataset
  2. Create IDV file
  3. Run AutoML experiments with IDV file

Expected behavior
FastTree might be able to use the paging file.
Or, maybe, we could optionally stop data loading near OOM point and then use it for training.

Additional context

Exception during AutoML iteration: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at System.Collections.Generic.List`1.AddWithResize(T item)
   at Microsoft.ML.Trainers.FastTree.DataConverter.ValuesList.Add(Int32 index, Double value) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 2387
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1865
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1765
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1773
   at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 956
   at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 2740
   at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 194
   at Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTreeClassification.cs:line 198
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 157
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 77
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
   at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 29

1 models were returned after 3516.55 seconds

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Threading.Thread.StartInternal(ThreadHandle t, Int32 stackSize, Int32 priority, Char* pThreadName)
   at System.Threading.Thread.StartCore()
   at Microsoft.ML.Internal.Utilities.Utils.ImmediateBackgroundThreadPool.<QueueAsync>g__Enqueue|5_1(ValueTuple`3 item) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\ThreadUtils.cs:line 121
   at Microsoft.ML.Data.DataViewUtils.Splitter.ConsolidateCore(IChannelProvider provider, DataViewRowCursor[] inputs, Object[]& ourPools, IChannel ch) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 376
   at Microsoft.ML.Data.DataViewUtils.Splitter.Consolidate(IChannelProvider provider, DataViewRowCursor[] inputs, Int32 batchSize, Object[]& ourPools) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 328
   at Microsoft.ML.Data.DataViewUtils.ConsolidateGeneric(IChannelProvider provider, DataViewRowCursor[] inputs, Int32 batchSize) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 260
   at Microsoft.ML.Data.DataViewUtils.TryCreateConsolidatingCursor(DataViewRowCursor& curs, IDataView view, IEnumerable`1 columnsNeeded, IHost host, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 116
   at Microsoft.ML.Data.TransformBase.GetRowCursor(IEnumerable`1 columnsNeeded, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\TransformBase.cs:line 85
   at Microsoft.ML.Transforms.ColumnSelectingTransformer.SelectColumnsDataTransform.GetRowCursor(IEnumerable`1 columnsNeeded, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\ColumnSelecting.cs:line 689
   at Microsoft.ML.AutoML.DatasetDimensionsUtil.CountRows(IDataView data, UInt64 maxRows) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\DatasetDimensions\DatasetDimensionsUtil.cs:line 69
   at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateTrainData(IDataView trainData, ColumnInformation columnInformation) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Utils\UserInputValidationUtil.cs:line 71
   at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateExperimentExecuteArgs(IDataView trainData, ColumnInformation columnInformation, IDataView validationData, TaskKind task) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Utils\UserInputValidationUtil.cs:line 31
   at Microsoft.ML.AutoML.ExperimentBase`2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator`1 preFeaturizer, IProgress`1 progressHandler) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\API\ExperimentBase.cs:line 280
   at Microsoft.ML.AutoML.ExperimentBase`2.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\API\ExperimentBase.cs:line 196
   at Kwork.AI.AutoML.Experiment.RunAutoMLExperiment(MLContext mlContext, ColumnInferenceResults columnInference, String logFile, BinaryClassificationTrainer trainer, Int32 timeToTrain) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1632
   at Kwork.AI.AutoML.Experiment.RunExperiment(String dataset, BinaryClassificationTrainer trainer, Int32 timeToTrain, String experimentLogPath, String allSummaryLogPath, String preselectedLabel, Nullable`1 optimizationMetric) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1088
Exception of type 'System.OutOfMemoryException' was thrown.

The below one is something like "Too few virtual address resources to complete the operation" - sorry I do not have the error in English.

System.InvalidOperationException: Exception thrown in reading
 ---> System.IO.IOException: Liian vähän virtuaalisia osoiteresursseja toiminnon suorittamiseen loppuun. : 'C:\temp\data.csv.idv'
   at System.IO.Strategies.OSFileStreamStrategy.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Strategies.BufferedFileStreamStrategy.ReadSpan(Span`1 destination, ArraySegment`1 arraySegment)
   at System.IO.FileStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at Microsoft.ML.Internal.Utilities.Utils.ReadBlock(Stream s, Byte[] buff, Int32 offset, Int32 length) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\Stream.cs:line 883
   at Microsoft.ML.Data.IO.BinaryLoader.Cursor.ReadPipe`1.PrepAndSendCompressedBlock(Int64 blockIndex, Int64 blockSequence, Int32 rowCount) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 1821
   at Microsoft.ML.Data.IO.BinaryLoader.Cursor.ReaderWorker() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 1426
   --- End of inner exception stack trace ---
   at Microsoft.ML.Internal.Utilities.ExceptionMarshaller.ThrowIfSet(IExceptionContext ectx) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\ThreadUtils.cs:line 240
   at Microsoft.ML.Data.IO.BinaryLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 2001
   at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
   at Microsoft.ML.Transforms.NormalizingTransformer.Train(IHostEnvironment env, IDataView data, ColumnOptionsBase[] columns) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\Normalizer.cs:line 570
   at Microsoft.ML.Transforms.NormalizingEstimator.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\Normalizer.cs:line 332
   at Microsoft.ML.DataOperationsCatalog.CreateSplitColumn(IHostEnvironment env, IDataView& data, String samplingKeyColumn, Nullable`1 seed, Boolean fallbackInEnvSeed) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\DataOperationsCatalog.cs:line 584
   at Microsoft.ML.DataOperationsCatalog.TrainTestSplit(IDataView data, Double testFraction, String samplingKeyColumnName, Nullable`1 seed) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\DataOperationsCatalog.cs:line 417
   at Kwork.AI.AutoML.Experiment.RunAutoMLExperiment(MLContext mlContext, ColumnInferenceResults columnInference, String logFile, BinaryClassificationTrainer trainer, Int32 timeToTrain) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1629
   at Kwork.AI.AutoML.Experiment.RunExperiment(String dataset, BinaryClassificationTrainer trainer, Int32 timeToTrain, String experimentLogPath, String allSummaryLogPath, String preselectedLabel, Nullable`1 optimizationMetric) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1088

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning processP2Priority of the issue for triage purpose: Needs to be fixed at some point.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions