Description
This issue is suggesting an enhancement since the current approaches/workarounds have both drawbacks, meaning we're not getting the scalability and performace as good as it could be in multithreaded scenarios like ASP.NET Core web apps or Web API services.
Context:
In multiple scenarios, but especially in ASP.NET Core web apps, the model-object (ITransformer) and the prediction function (PredictionFunction) object should be re-used because they are "expensive" objects when initializing and will impact when having many concurrent users.
In the case of the model-object (ITransformer), it is a thread-safe object, so the model-object loaded from the .ZIP file can be registered as singleton. That way it'll be re-used by any thread or Http request in the ASP.NET Core app. See this code as an example: https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/Regression-SalesForecast/src/eShopDashboard/Startup.cs#L53
Main issue with the Prediction Function:
The Prediction function is also "expensive" as it takes significant milliseconds when creating the prediction-function object from the model-object with the model.MakePredictionFunction()
method.
Ideally, because it is "expensive", it should be re-used across executions in the same app. But since it is not thread-safe, the current options in an ASP.NET Core web app are the following:
- OPTION 1: Register the PredictionFunction object as Scoped (AddScoped()
) for its DI/IoC object lifetime, as in this code:
https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/Regression-SalesForecast/src/eShopDashboard/Startup.cs#L61
But, the problem with this approach is that in many of the cases you don't get much benefit unless within the same Http request you make multiple calls to the .Predict() method. The point is that since the Scope lifetime is only re-used within a single Http request, for most of the Http requests it'll need to call the model.MakePredictionFunction() method.
The only way to share an object across Http requests in .NET Core DI/IoC is with singleton, but that requires the object to be thread-safe.
The possible service/object lifetimes in .NET Core IoC/DI are:
- Singleton (shared across all threads)
- Transient (per object call/new)
- Scoped (Per Http request of object scope)
See the available object lifetimes here:
https://docs.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-2.1#service-lifetimes
https://blogs.msdn.microsoft.com/cesardelatorre/2017/01/26/comparing-asp-net-core-ioc-service-life-times-and-autofac-ioc-instance-scopes/ (Autofac does support Thread Scope, though..)
- OPTION 2: Register the PredictionFunction object as Singleton (AddSingleton()
) for its DI/IoC object lifetime, as in this code (Commented line):
https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/Regression-SalesForecast/src/eShopDashboard/Startup.cs#L60
Benefits: If you register the prediction-function object as Singleton for its ASP.NET Core DI/IoC object lifetime, since it is Singleton, any new Http request (except the first Http request since the app started) will just use the prediction function by calling the Predict()
method. Therefore the performance of a single Http request would be significantly better.
Problem: However, the issue with this approach is that since the prediction function is not thread-safe, you need to use mechanisms like a critical section to lock the prediction function object to be used by a single thread when executing the Predict()
method, like in this code (Commented line):
https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/end-to-end-apps/Regression-SalesForecast/src/eShopDashboard/Controllers/CountrySalesForecastController.cs#L57
The issue with this approach is that since you are limiting the use of "Predict()" to a single thread, that will be a bottleneck when scaling. In the cases when you could have many concurrent Http requests and many of them trying to run the Predict() method, the scalability won't be as good as it could be because we're limiting that code execution to a single thread able to run the Predict() method.
Basically, with this approach you might significantly limiting the scalability of the app (in regards the model execution/scoring) across threads when having many Http requests.
Workarounds: Currently, use any of the two mentioned methods, being aware of the handicaps from each:
- PredictionFunction as Scoped object
- Predictionfunction as Singleton but using critical section when running "Predict()"
Long term solutions:
I see two possible solutions:
-
Create an object pooling of "prediction function objects". That way, most Http requests won't need to call the "expensive"
.CreatePredictionFunction()
method while it would be scalable since many threads could be using the multiple objects available in the object pooling. -
Make the prediction Function thread-safe. If the prediction-function was thread-safe while scalable enough, it could be simply registered as Singleton in the DI/IoC system in ASP.NET Core without needing to use a critical section or comparable approaches.
Is there any other possible approach available?
Once these scenarios are further discussed in this thread or in our API reviews, I'll document the current possible approaches when consuming/running an ML.NET morel in an ASP.NET Core web app or Web API service.
Related issues:
#421