-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added slot names support for OnnxTransformer #4857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.50476193, | ||
-0.97911227 | ||
0.504761934, | ||
-0.979112267 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come these changed? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. But I see this occurring off and on that the baselines numbers change when we run them locally. I am ignoring them because the change is in the 7th decimal place
In reply to: 381858622 [](ancestors = 381858622)
var mlNetSlotNames = mlNetSlots.DenseValues().ToList(); | ||
var onnxSlotNames = onnxSlots.DenseValues().ToList(); | ||
for (int j = 0; j < mlNetSlots.Length; j++) | ||
Assert.Equal(mlNetSlotNames[j].ToString(), onnxSlotNames[j].ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equal [](start = 31, length = 5)
nit: I think Assert.Equal also has an overload for IEnumerables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that already. But the Assert was firing even when all the strings were equal. Not sure why.
In reply to: 381859560 [](ancestors = 381859560)
var labelEncoderOutput = ctx.AddIntermediateVariable(NumberDataViewType.Int64, labelEncoderOutputName, true); | ||
var node = ctx.CreateNode(opType, one, labelEncoderOutput, labelEncoderNodeName); | ||
node.AddAttribute("keys_strings", slotNamesAsStrings); | ||
node.AddAttribute("values_int64s", Enumerable.Range(0, slotNames.Length).Select(x => (long)x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values_int64s [](start = 31, length = 13)
Why do we need this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are unused. But are specified only to satisfy ORT.
In reply to: 381867237 [](ancestors = 381867237)
This PR adds support for persisting the SlotNames annotations of a column during onnx export and reading those back in OnnxTransformer and adding the annotations back to the column when the onnx model is read from disk.
Onnx natively does not have support for annotations. To work around this, we store some metadata in some unused portions of the graph. As an example, let us say we have an ML.NET model with an output column NGrams that outputs a vector of NGram counts. This column will have an Annotation in ML.NET named SlotNames. When this model is exported to onnx, we create an additional LabelEncoder node and store the SlotNames in the keys_strings attribute of the LabelEncoder.
The LabelEncoder is created with an input name of
$"mlnet.{column.Name}.unusedInput"
, an output name of $"mlnet.{column.Name}.unusedOutput" and a node name of$"mlnet.{column.Name}.SlotNames"
. (All the actual output columns of the ML.NET model are suffixed with a".output"
string)Then when OnnxTransformer loads the graph it goes through the list of output nodes and creates output columns for each of them in its output schema. For each column it searches the graph for a node named
$"mlnet.{column.Name}.SlotNames"
. If it finds it, it reads the keys_strings attributes from that node and adds those strings as SlotNames annotation to that column.This SlotNames data should then be available as annotations on the column in both ML.NET and Nimbus.