Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added slot names support for OnnxTransformer #4857

Merged
merged 2 commits into from
Feb 19, 2020
Merged

Added slot names support for OnnxTransformer #4857

merged 2 commits into from
Feb 19, 2020

Conversation

harishsk
Copy link
Contributor

This PR adds support for persisting the SlotNames annotations of a column during onnx export and reading those back in OnnxTransformer and adding the annotations back to the column when the onnx model is read from disk.

Onnx natively does not have support for annotations. To work around this, we store some metadata in some unused portions of the graph. As an example, let us say we have an ML.NET model with an output column NGrams that outputs a vector of NGram counts. This column will have an Annotation in ML.NET named SlotNames. When this model is exported to onnx, we create an additional LabelEncoder node and store the SlotNames in the keys_strings attribute of the LabelEncoder.

The LabelEncoder is created with an input name of $"mlnet.{column.Name}.unusedInput", an output name of $"mlnet.{column.Name}.unusedOutput" and a node name of $"mlnet.{column.Name}.SlotNames". (All the actual output columns of the ML.NET model are suffixed with a ".output" string)

Then when OnnxTransformer loads the graph it goes through the list of output nodes and creates output columns for each of them in its output schema. For each column it searches the graph for a node named $"mlnet.{column.Name}.SlotNames". If it finds it, it reads the keys_strings attributes from that node and adds those strings as SlotNames annotation to that column.

This SlotNames data should then be available as annotations on the column in both ML.NET and Nimbus.

Copy link
Member

@ganik ganik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@harishsk harishsk merged commit 8d1809e into dotnet:master Feb 19, 2020
0.50476193,
-0.97911227
0.504761934,
-0.979112267
Copy link

@yaeldekel yaeldekel Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these changed? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. But I see this occurring off and on that the baselines numbers change when we run them locally. I am ignoring them because the change is in the 7th decimal place


In reply to: 381858622 [](ancestors = 381858622)

var mlNetSlotNames = mlNetSlots.DenseValues().ToList();
var onnxSlotNames = onnxSlots.DenseValues().ToList();
for (int j = 0; j < mlNetSlots.Length; j++)
Assert.Equal(mlNetSlotNames[j].ToString(), onnxSlotNames[j].ToString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal [](start = 31, length = 5)

nit: I think Assert.Equal also has an overload for IEnumerables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that already. But the Assert was firing even when all the strings were equal. Not sure why.


In reply to: 381859560 [](ancestors = 381859560)

var labelEncoderOutput = ctx.AddIntermediateVariable(NumberDataViewType.Int64, labelEncoderOutputName, true);
var node = ctx.CreateNode(opType, one, labelEncoderOutput, labelEncoderNodeName);
node.AddAttribute("keys_strings", slotNamesAsStrings);
node.AddAttribute("values_int64s", Enumerable.Range(0, slotNames.Length).Select(x => (long)x));
Copy link

@yaeldekel yaeldekel Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values_int64s [](start = 31, length = 13)

Why do we need this? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are unused. But are specified only to satisfy ORT.


In reply to: 381867237 [](ancestors = 381867237)

@harishsk harishsk deleted the slotNames branch April 21, 2020 23:59
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants