Description
System Information (please complete the following information):
- OS & Version: Windows 11
- ML.NET Version: ML.NET v1.5.5
- .NET Version: NET6.0
Describe the bug
Out-of-memory errors on FastTree. There is still virtual (paging) memory available, but RAM is full. Maybe there is something that could be done to more effectively use virtual memory? Strangely, one of the 128 Gb Ryzens is running FastTree on this dataset, while 4 similar failed with various OOM errors so I am able to run the training, just a bit slower.
Dataset: 112 Gb IDV file, 369 Gb CSV file
RAM: 128 gb
It is one file, with sampling key.
The one working has about 145 Gb virtual memory, system managed. Other's have 500 - 1000 gb fixed sized files in Windows advanced settings.
To Reproduce
Steps to reproduce the behavior:
- Create big dataset
- Create IDV file
- Run AutoML experiments with IDV file
Expected behavior
FastTree might be able to use the paging file.
Or, maybe, we could optionally stop data loading near OOM point and then use it for training.
Additional context
Exception during AutoML iteration: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.AddWithResize(T item)
at Microsoft.ML.Trainers.FastTree.DataConverter.ValuesList.Add(Int32 index, Double value) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 2387
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1865
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1765
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 1773
at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 956
at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 2740
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTree.cs:line 194
at Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer.TrainModelCore(TrainContext context) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.FastTree\FastTreeClassification.cs:line 198
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 157
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 77
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 68
at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Experiment\Runners\RunnerUtil.cs:line 29
1 models were returned after 3516.55 seconds
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Threading.Thread.StartInternal(ThreadHandle t, Int32 stackSize, Int32 priority, Char* pThreadName)
at System.Threading.Thread.StartCore()
at Microsoft.ML.Internal.Utilities.Utils.ImmediateBackgroundThreadPool.<QueueAsync>g__Enqueue|5_1(ValueTuple`3 item) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\ThreadUtils.cs:line 121
at Microsoft.ML.Data.DataViewUtils.Splitter.ConsolidateCore(IChannelProvider provider, DataViewRowCursor[] inputs, Object[]& ourPools, IChannel ch) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 376
at Microsoft.ML.Data.DataViewUtils.Splitter.Consolidate(IChannelProvider provider, DataViewRowCursor[] inputs, Int32 batchSize, Object[]& ourPools) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 328
at Microsoft.ML.Data.DataViewUtils.ConsolidateGeneric(IChannelProvider provider, DataViewRowCursor[] inputs, Int32 batchSize) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 260
at Microsoft.ML.Data.DataViewUtils.TryCreateConsolidatingCursor(DataViewRowCursor& curs, IDataView view, IEnumerable`1 columnsNeeded, IHost host, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 116
at Microsoft.ML.Data.TransformBase.GetRowCursor(IEnumerable`1 columnsNeeded, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\TransformBase.cs:line 85
at Microsoft.ML.Transforms.ColumnSelectingTransformer.SelectColumnsDataTransform.GetRowCursor(IEnumerable`1 columnsNeeded, Random rand) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\ColumnSelecting.cs:line 689
at Microsoft.ML.AutoML.DatasetDimensionsUtil.CountRows(IDataView data, UInt64 maxRows) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\DatasetDimensions\DatasetDimensionsUtil.cs:line 69
at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateTrainData(IDataView trainData, ColumnInformation columnInformation) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Utils\UserInputValidationUtil.cs:line 71
at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateExperimentExecuteArgs(IDataView trainData, ColumnInformation columnInformation, IDataView validationData, TaskKind task) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\Utils\UserInputValidationUtil.cs:line 31
at Microsoft.ML.AutoML.ExperimentBase`2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator`1 preFeaturizer, IProgress`1 progressHandler) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\API\ExperimentBase.cs:line 280
at Microsoft.ML.AutoML.ExperimentBase`2.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.AutoML\API\ExperimentBase.cs:line 196
at Kwork.AI.AutoML.Experiment.RunAutoMLExperiment(MLContext mlContext, ColumnInferenceResults columnInference, String logFile, BinaryClassificationTrainer trainer, Int32 timeToTrain) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1632
at Kwork.AI.AutoML.Experiment.RunExperiment(String dataset, BinaryClassificationTrainer trainer, Int32 timeToTrain, String experimentLogPath, String allSummaryLogPath, String preselectedLabel, Nullable`1 optimizationMetric) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1088
Exception of type 'System.OutOfMemoryException' was thrown.
The below one is something like "Too few virtual address resources to complete the operation" - sorry I do not have the error in English.
System.InvalidOperationException: Exception thrown in reading
---> System.IO.IOException: Liian vähän virtuaalisia osoiteresursseja toiminnon suorittamiseen loppuun. : 'C:\temp\data.csv.idv'
at System.IO.Strategies.OSFileStreamStrategy.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Strategies.BufferedFileStreamStrategy.ReadSpan(Span`1 destination, ArraySegment`1 arraySegment)
at System.IO.FileStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.ML.Internal.Utilities.Utils.ReadBlock(Stream s, Byte[] buff, Int32 offset, Int32 length) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\Stream.cs:line 883
at Microsoft.ML.Data.IO.BinaryLoader.Cursor.ReadPipe`1.PrepAndSendCompressedBlock(Int64 blockIndex, Int64 blockSequence, Int32 rowCount) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 1821
at Microsoft.ML.Data.IO.BinaryLoader.Cursor.ReaderWorker() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 1426
--- End of inner exception stack trace ---
at Microsoft.ML.Internal.Utilities.ExceptionMarshaller.ThrowIfSet(IExceptionContext ectx) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Utilities\ThreadUtils.cs:line 240
at Microsoft.ML.Data.IO.BinaryLoader.Cursor.MoveNextCore() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\Binary\BinaryLoader.cs:line 2001
at Microsoft.ML.Data.RootCursorBase.MoveNext() in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Core\Data\RootCursorBase.cs:line 72
at Microsoft.ML.Transforms.NormalizingTransformer.Train(IHostEnvironment env, IDataView data, ColumnOptionsBase[] columns) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\Normalizer.cs:line 570
at Microsoft.ML.Transforms.NormalizingEstimator.Fit(IDataView input) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\Transforms\Normalizer.cs:line 332
at Microsoft.ML.DataOperationsCatalog.CreateSplitColumn(IHostEnvironment env, IDataView& data, String samplingKeyColumn, Nullable`1 seed, Boolean fallbackInEnvSeed) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\DataOperationsCatalog.cs:line 584
at Microsoft.ML.DataOperationsCatalog.TrainTestSplit(IDataView data, Double testFraction, String samplingKeyColumnName, Nullable`1 seed) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src\Microsoft.ML.Data\DataLoadSave\DataOperationsCatalog.cs:line 417
at Kwork.AI.AutoML.Experiment.RunAutoMLExperiment(MLContext mlContext, ColumnInferenceResults columnInference, String logFile, BinaryClassificationTrainer trainer, Int32 timeToTrain) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1629
at Kwork.AI.AutoML.Experiment.RunExperiment(String dataset, BinaryClassificationTrainer trainer, Int32 timeToTrain, String experimentLogPath, String allSummaryLogPath, String preselectedLabel, Nullable`1 optimizationMetric) in Q:\git-kwork-microsoft-ml\Microsoft.ML\src-AutoML-Runner\Kwork.AI.AutoML.Runner\Experiment.cs:line 1088