Description
Hello.
I've trained a model with autosklearn and now I want to deploy it with a Flask API. I've serialized the model with joblib and each request to the Flask API runs predict() on about 10 to 20 rows of a Pandas dataframe.
A single request for the API usually runs in about 80ms but if I try to run 10 requests simultaneously each one takes about 1400ms, more than the 800ms it takes to run those requests one after the another.
Can someone offer some insight on why running predict() in parallel has such a bad performance?
P.S.: The use of CPU seems to be the same while running the requests in series and in parallel and I've already tried using different objects on each predict with no success.
P.S.: I'm using version 0.8.0 of autosklearn. Mostly because more recent versions don't include de regressor ridge_regression, which works best for my training.
Ubuntu 20.04.2
No virtual environment
Python 3.8.5
Auto-sklearn version 0.8.0