Description
System Information
- Linux Docker container running on Windows 10 / 11 or in ACI
- ML.NET v1.4.0
- .NET Version: .NET 6.0
Problem description
We have an application that utilizes image classification by using TensorFlow, based on the sample provided in .NET ML. The application executes the following code, which correctly starts, runs, but never completes when running in the Linux docker container on the Windows 10 host.
When the container is started from the command line it simply exists without showing any error. However, when we run the container in VS (F5) the container shows the error in the output window.
TensorFlow .. metaoprimizer.cc 499 ..model_runner exceeded deadline
Here is the simplified code that fails (crashes) in line N+0.
try
{
(N+0) var cvResults = mlContext.MulticlassClassification.CrossValidate(cvDataView, pipeline, numberOfFolds: numOfFolds, labelColumnName: "LabelAsKey", seed: 8881);
(N+1) Console.Write("Never executed");
}
catch
{
(N+2)
}
The container simply exists without throwing any exception. The lines N+1 and N+2 are never reached.
The same code works well when executed on the Windows host. We also have been able to execute the same code in the Linux container on Windows 11. We have tested it on multiple machines (same environment setup). The same code at some hosts is sometimes running correctly even if the execution time is longer (see image above).
Recap
Question: Is there a way to set the timeout for TensorFlow via ML.NET?
However, this behaviour also raises another question. Is this .NET + Docker + TensorFlow reliable at all if it doesn't show any error and stops the execution?