[FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. #123

dhrubo-os · 2023-03-26T19:39:51Z

Currently for build in ML algorithms in opensearch we need to write that in Java, which is sometimes more time consuming due to not having enough ML support in Java

One initiative we started is, we can write algorithm in TorchScript, trace the torchScript file and then load the model file in Opensearch using MLCommon's Model serving framework.

One bottleneck is, in torchScript we can't import any 3rd party library like scikit-learn so to include scikit-learn models in Opensearch we have to rewrite the algorithm in torchscrip which can't be the ideal solution.

To solve this problem we can use Hummingbird through which we can convert traditional machine learning algorithm to neural network based algorithm for faster execution and at the same time we should be able to convert the algorithm to torchScript or Onnx so that we can load the model in opensearch.

In this issue, we would like to investigate if humming bird will solve our issue or not.

The following steps can be done in the investigation

Import Hummingbird in the py-ml repo
convert a model in torchScript (We can start with simple stateless models like PCA/KernelPCA
Run both format of the algorithm (original scikit learn algorithm and converted pytorch and onnx algorithm)to compare the output.
We can have bench-marking to compare performance between all three formats of model execution.

AlibiZhenis · 2023-03-27T03:38:25Z

I'd like to work on this

dhrubo-os · 2023-03-27T03:55:16Z

Sure, please go ahead.

AlibiZhenis · 2023-03-30T13:55:34Z

So I played around with it in this notebook: https://www.kaggle.com/code/alibizhenis/hummingbird
I trained and converted Random Forest, SVC, and KNN classifiers using sklearn and converted them to torch.
Points to note:

The performances of original and converted models were identical
I couldn't convert them to onnx. I kept getting "backend not supported" error. I suspect that the conversion of these specific models to onnx is not supported, rather than in general.
For KNN (and maybe some other models), the conversion wasn't smooth. The normal convert method didn't work, convert_batch needed to be used. The input shape of the converted model was limited to test_input.shape[0] * k + remainder_size, where test_input and remainder_size are parameters of the convert_batch method and k is any integer. Therefore, if we want to use the converted model with any number of samples, test_input always needs to be only 1 sample. I imagine there would be more such nuances with other models.

AlibiZhenis · 2023-03-31T13:50:12Z

I compared three transformative models here: https://www.kaggle.com/code/alibizhenis/hummingbird-pca

I couldn't convert them to onnx again, but converted all of them to torchscript
All three pairs of models produce datasets with pairwise elements that are close up to 4th decimal.

dhrubo-os · 2023-03-31T18:49:04Z

I compared three transformative models here: https://www.kaggle.com/code/alibizhenis/hummingbird-pca
* I couldn't convert them to onnx again, but converted all of them to torchscript

* All three pairs of models produce datasets with pairwise elements that are close up to 4th decimal.

Thanks for your experiment. Can you also please try to run same algorithms with sklearn-onnx and compare the outputs on the same dataset as well?

AlibiZhenis · 2023-04-02T09:53:05Z

I tested them in the same notebook with skl2onnx. Results of PCA and KernelPCA matched, while converted TruncatedSVD model produced completely different result for some reason.

dhrubo-os · 2023-04-02T18:11:21Z

Thanks for the investigation. Can we also try to find any ARIMA model to convert to torchScript or ONNX? Mainly we will be interested to have any forecasting model converted into torchscript or onnx?

After doing that I would like you to wrap up all your investigations in this package's experiment branch.

You will add a notebook for PCA releated experiments for original, torchscript and onnx to have a side by side comparison.
Same another notebook for other algorithms with the comparison of original, torchscript and onnx
If you progress with any forecasting model, then please add another notebook for that as well.

Thanks for your hard work.

AlibiZhenis · 2023-04-03T12:59:00Z

Which framework would you like me to use for time series models? Because I don't think sklearn supports any. There is a statsmodels package, but it's not supported by hummingbird.

dhrubo-os · 2023-04-03T17:11:56Z

Yeah agree. This is the part we need some investigation also.

AlibiZhenis · 2023-04-05T05:02:24Z

I added the notebooks for PCA and classification.

Upon further research on time series forecasting, I concluded the following:

Hummingbird doesn't support any time series models.
There are some great time series packages, like statsmodels, pmdarima, sktime, and etc. But I haven't found any ways to convert their models to onnx or torchscript.

AlibiZhenis · 2023-04-15T10:13:04Z

Upon further research, I couldn't find ways to convert models from popular time series packages like statsmodels. Nonetheless, I found ways to use some models in torchscript and onnx (mostly deep learning):

HuggingFace's Time Series Transformer
NVIDIA's TSPP, which supports their own TFT model, XGBoost, AutoARIMA, and LSTM models. The platform provides a way to convert these models to torch and onnx.

dhrubo-os · 2023-04-27T16:12:53Z

Can we perform following experiments:

Try these models for forecasting with a time series dataset.
Convert the models in torchscript/onnx and perform forecasting with the same dataset
Use any traditional forecasting model (arima works) for forecasting with the same dataset.
Then compare the results.

dhrubo-os added enhancement New feature or request CCI labels Mar 26, 2023

github-actions bot added the untriaged label Mar 26, 2023

dhrubo-os removed the untriaged label Mar 26, 2023

AlibiZhenis mentioned this issue Apr 5, 2023

Model Conversion example notebooks #134

Closed

1 task

dhrubo-os changed the title ~~[FEATURE] Convering traditional ML algorithms using Hummingbird and benchmark model performance.~~ [FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. #123

[FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. #123

dhrubo-os commented Mar 26, 2023

AlibiZhenis commented Mar 27, 2023

dhrubo-os commented Mar 27, 2023

AlibiZhenis commented Mar 30, 2023

AlibiZhenis commented Mar 31, 2023

dhrubo-os commented Mar 31, 2023

AlibiZhenis commented Apr 2, 2023 •

edited

Loading

dhrubo-os commented Apr 2, 2023

AlibiZhenis commented Apr 3, 2023

dhrubo-os commented Apr 3, 2023

AlibiZhenis commented Apr 5, 2023

AlibiZhenis commented Apr 15, 2023

dhrubo-os commented Apr 27, 2023

[FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. #123

[FEATURE] Converting traditional ML algorithms using Hummingbird and benchmark model performance. #123

Comments

dhrubo-os commented Mar 26, 2023

AlibiZhenis commented Mar 27, 2023

dhrubo-os commented Mar 27, 2023

AlibiZhenis commented Mar 30, 2023

AlibiZhenis commented Mar 31, 2023

dhrubo-os commented Mar 31, 2023

AlibiZhenis commented Apr 2, 2023 • edited Loading

dhrubo-os commented Apr 2, 2023

AlibiZhenis commented Apr 3, 2023

dhrubo-os commented Apr 3, 2023

AlibiZhenis commented Apr 5, 2023

AlibiZhenis commented Apr 15, 2023

dhrubo-os commented Apr 27, 2023

AlibiZhenis commented Apr 2, 2023 •

edited

Loading