Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

ONNX model for OneHotVectorizer produces different result #429

Closed
@ganik

Description

@ganik

Repro
`from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import OnnxRunner
from nimbusml.feature_extraction.categorical import OneHotVectorizer

infert_df = get_dataset("infert").as_df()
infert_df.columns = [i.replace(': ', '') for i in infert_df.columns]
infert_df.rename(columns={'case': 'Label'}, inplace=True)

transform = OneHotVectorizer() << 'education_str'
print(transform.fit_transform(infert_df))
transform.export_to_onnx("test.onnx", 'com.microsoft.ml')
onnx_runner = OnnxRunner(model_file="test.onnx")
print(onnx_runner.fit_transform(infert_df))`

Output:
row_num education age ... education_str.0-5yrs education_str.6-11yrs education_str.12+ yrs
0 1 0.0 26.0 ... 1.0 0.0 0.0
1 2 0.0 42.0 ... 1.0 0.0 0.0
2 3 0.0 39.0 ... 1.0 0.0 0.0
3 4 0.0 34.0 ... 1.0 0.0 0.0
4 5 2.0 35.0 ... 0.0 1.0 0.0
.. ... ... ... ... ... ... ...
243 244 1.0 31.0 ... 0.0 0.0 1.0
244 245 1.0 34.0 ... 0.0 0.0 1.0
245 246 1.0 35.0 ... 0.0 0.0 1.0
246 247 1.0 29.0 ... 0.0 0.0 1.0
247 248 1.0 23.0 ... 0.0 0.0 1.0

[248 rows x 12 columns]
row_num education age ... education_str.onnx.0 education_str.onnx.1 education_str.onnx.2
0 1 0.0 26.0 ... 0.0 1.0 0.0
1 2 0.0 42.0 ... 0.0 1.0 0.0
2 3 0.0 39.0 ... 0.0 1.0 0.0
3 4 0.0 34.0 ... 0.0 1.0 0.0
4 5 2.0 35.0 ... 0.0 0.0 1.0
.. ... ... ... ... ... ... ...
243 244 1.0 31.0 ... 0.0 0.0 0.0
244 245 1.0 34.0 ... 0.0 0.0 0.0
245 246 1.0 35.0 ... 0.0 0.0 0.0
246 247 1.0 29.0 ... 0.0 0.0 0.0
247 248 1.0 23.0 ... 0.0 0.0 0.0

[248 rows x 22 columns]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions