Skip to content

Memory leak in text classification pipeline #4399

Closed
@IgnasZeb

Description

@IgnasZeb

System information

  • OS version/distro: Microsoft Windows [Version 10.0.18362.418]
  • .NET Version (eg., dotnet --info): .NET Core 3.0.100 04339c3a26

Issue

  • What did you do? Detected potential memory leak in production application, so I wrote simple application to see if problem persists.
  • What happened? Microsoft.ML 1.3.1 and 1.4.0-preview2 both appear to be leaking memory

Source code / logs

Minimal code that consistently reproduces this problem:

    class ModelInput
    {
        [ColumnName("TextColumn"), LoadColumn(0)]
        public string ItemDescription { get; set; }
        [ColumnName("Label"), LoadColumn(1)]
        public int ItemId { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Sleeping for 5 seconds. Collect initial memory snapshot...");
            while(true)
            {
                Thread.Sleep(5000);
                BuildAndTrainModel();
                Console.WriteLine("Training done. Collect memory snapshot...");
            }
        }

        static void BuildAndTrainModel()
        {
            MLContext context = new MLContext(seed: 1);

            var dataView = context.Data.LoadFromTextFile<ModelInput>("input.csv", separatorChar: ',');

            var trainingPipeline = context.Transforms.Conversion.MapValueToKey(new[] { new InputOutputColumnPair("Label", "Label") })
                .Append(context.Transforms.Text.FeaturizeText("TextColumn", "TextColumn"))
                .Append(context.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: "Label", featureColumnName: "TextColumn", maximumNumberOfIterations: 1));

            var model = trainingPipeline.Fit(dataView);
        }
    }

After each call to BuildAndTrainModel following objects appear to leak:
image

Amount of objects leaked appears to correlate with number of iterations (increasing maximumNumberOfIterations increases leaked object count).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions