Multiclass text classification: training consume a lot of RAM

[ds example.txt](https://github.com/dotnet/machinelearning/files/7569870/ds.example.txt)
### System information

- **Windows 10 Home Single Language**
- **.NET Version 5.0.400**
- **Microsoft.ML 1.6.0**

### Issue

I'm trying to train model with some dataset. Dataset is about 60 Mb (example in attachments, can't provide full data set because of privacy). It contains some text descriptions about 50-200 chars in each row. Total labels count - 84. There are about 100K rows in dataset for training. After 16-18 hours of training application consume about 32 Gb RAM and terminate with System.OutOfMemory exception (I have only 32 Gb free RAM on my PC). Is this RAM consumption is ok for such kind of task or maybe I'm doing something wrong?

### Source code / logs
My data class:
```csharp
public class SkuInfo
{
  [ColumnName("Category")]
  public string CategoryCode { get; set; }
  
  [ColumnName("ManufacturerId")]
  public float ManufacturerId { get; set; }
  
  [ColumnName("ManufacturerPn")]
  public string ManufacturerPn { get; set; }
  
  [ColumnName("Description")]
  public string Description { get; set; }
}
```

My trainig pipeline:
```csharp
private IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
{
	var pipeline = mlContext.Transforms.ReplaceMissingValues(@"ManufacturerId", @"ManufacturerId")
							.Append(mlContext.Transforms.Text.FeaturizeText(@"ManufacturerPn", @"ManufacturerPn"))
							.Append(mlContext.Transforms.Text.FeaturizeText(@"Description", @"Description"))
							.Append(mlContext.Transforms.Concatenate(@"Features", new[] { @"ManufacturerId", "ManufacturerPn", @"Description" }))
							.Append(mlContext.Transforms.Conversion.MapValueToKey(@"Category", @"Category"))
							.Append(mlContext.Transforms.NormalizeMinMax(@"Features", @"Features"))
							.Append(mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(l1Regularization: 0.455F, l2Regularization: 0.034F, labelColumnName: @"Category", featureColumnName: @"Features"))
							.Append(mlContext.Transforms.Conversion.MapKeyToValue(@"PredictedLabel", "PredictedLabel"));

	return pipeline;
}
```
Training method:
```csharp
public void TrainFromCollection(IEnumerable<SkuInfo> trainData, string outputModelPath)
{
	var mlContext = new MLContext(seed: 1);
	var dataView = mlContext.Data.LoadFromEnumerable(trainData);
	var pipeline = BuildPipeline(mlContext);
	var model = pipeline.Fit(dataView);
	mlContext.Model.Save(model, dataView.Schema, outputModelPath);
}
```
[ds example.txt](https://github.com/dotnet/machinelearning/files/7569878/ds.example.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiclass text classification: training consume a lot of RAM #6007

System information

Issue

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiclass text classification: training consume a lot of RAM #6007

Description

System information

Issue

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions