ONNX model for OneHotVectorizer produces different result #429
Description
Repro
`from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import OnnxRunner
from nimbusml.feature_extraction.categorical import OneHotVectorizer
infert_df = get_dataset("infert").as_df()
infert_df.columns = [i.replace(': ', '') for i in infert_df.columns]
infert_df.rename(columns={'case': 'Label'}, inplace=True)
transform = OneHotVectorizer() << 'education_str'
print(transform.fit_transform(infert_df))
transform.export_to_onnx("test.onnx", 'com.microsoft.ml')
onnx_runner = OnnxRunner(model_file="test.onnx")
print(onnx_runner.fit_transform(infert_df))`
Output:
row_num education age ... education_str.0-5yrs education_str.6-11yrs education_str.12+ yrs
0 1 0.0 26.0 ... 1.0 0.0 0.0
1 2 0.0 42.0 ... 1.0 0.0 0.0
2 3 0.0 39.0 ... 1.0 0.0 0.0
3 4 0.0 34.0 ... 1.0 0.0 0.0
4 5 2.0 35.0 ... 0.0 1.0 0.0
.. ... ... ... ... ... ... ...
243 244 1.0 31.0 ... 0.0 0.0 1.0
244 245 1.0 34.0 ... 0.0 0.0 1.0
245 246 1.0 35.0 ... 0.0 0.0 1.0
246 247 1.0 29.0 ... 0.0 0.0 1.0
247 248 1.0 23.0 ... 0.0 0.0 1.0
[248 rows x 12 columns]
row_num education age ... education_str.onnx.0 education_str.onnx.1 education_str.onnx.2
0 1 0.0 26.0 ... 0.0 1.0 0.0
1 2 0.0 42.0 ... 0.0 1.0 0.0
2 3 0.0 39.0 ... 0.0 1.0 0.0
3 4 0.0 34.0 ... 0.0 1.0 0.0
4 5 2.0 35.0 ... 0.0 0.0 1.0
.. ... ... ... ... ... ... ...
243 244 1.0 31.0 ... 0.0 0.0 0.0
244 245 1.0 34.0 ... 0.0 0.0 0.0
245 246 1.0 35.0 ... 0.0 0.0 0.0
246 247 1.0 29.0 ... 0.0 0.0 0.0
247 248 1.0 23.0 ... 0.0 0.0 0.0
[248 rows x 22 columns]