You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The two set_num_interop_threads, get_num_interop_threads CPU threading variables explained here for PyTroch and for TensorFlow have a huge impact on CPU inferencing time, e.g. for Resnet 18 TorchVistion image model under one CPU core assignment it results in the following difference in latencies (before applying and after applying these values). I think it is worthwhile adding this two variable as a configurable setting variable at least for Huggingface runtime that is using deep models (and I can validate I have seen the same trends for many Huggingface pipeline models too).
The text was updated successfully, but these errors were encountered:
Based on the PyTorch Documentation it seems the number of cores is a good heuristic for both variables. That's what I used in the above example.
I think the best option would be to add such config to the user side for Huggingface server in a way that it adds the value of these two parameters as a config value in the setting folder. If not set then the default value can be the number of CPUs. However, this can be further optimized but I think that will be out of the scope for MLServer, however, if you are interested this paper provides an in-depth investigation on the topic.
The two set_num_interop_threads, get_num_interop_threads CPU threading variables explained here for PyTroch and for TensorFlow have a huge impact on CPU inferencing time, e.g. for Resnet 18 TorchVistion image model under one CPU core assignment it results in the following difference in latencies (before applying and after applying these values). I think it is worthwhile adding this two variable as a configurable setting variable at least for Huggingface runtime that is using deep models (and I can validate I have seen the same trends for many Huggingface pipeline models too).
The text was updated successfully, but these errors were encountered: