-
Notifications
You must be signed in to change notification settings - Fork 54
Description
There are hundreds of potential operators found across ML libraries, and implementing them all in a web standard is not feasible. So instead, we should at least support a mature enough core operator set that would enable composability of larger aggregate operators. Ningxin presented the idea at TPAC 2024, where you can define a composite operator (such as multihead attention, which is not in WebNN) and then execute it. Then if the backend has a compatible implementation of the subgraph as a builtin operator, passing that higher level of expression through the user agent to the backend can be simpler/more efficient than needing to recognize the patterns nested throughout the graph, modify the graph, and refuse them.
Below is a possible brief example of the idea (which may turn out quite differently after we think more about it). e.g.:
// Assume tanh was not already a built-in WebNN operator:
// tanh(x) = (exp(2 * x) - 1) / (exp(2 * x) + 1)
function buildTanh(builder, inputDesc)
{
let tanh = builder.div(
builder.sub(
builder.exp(builder.mul(builder.constant(inputDesc.dataType, 2), builder.input("input", inputDesc))),
builder.constant(inputDesc.dataType, 1)
),
builder.add(
builder.exp(builder.mul(builder.constant(inputDesc.dataType, 2), builder.input("input", inputDesc))),
builder.constant(inputDesc.dataType, 1)
)
);
return graphBuilder.buildSubgraph(tanh, {"input"}, {"output"});
}
...
let tanh = buildTanh(graphBuilder, inputDesc);
let tanhResult = graphBuilder.subgraph(tanh, {"input": input});
let mulResult = graphBuilder.mul(tanhResult.output, ...);Benefits
- Enables the API to support new operators earlier (for niche operators that might never be part of WebNN or are not in the official API yet), falling back to the decomposition when absent.
- Boosts performance when the backend has a mapping for it.
- Supports large operators like "attention" and "mixture of experts" without permanently complicating the API with heavyweight but potentially non-durable operators (as we've seen with large operators like LSTM and GRU that rise and fall in popularity).
- The pattern matching only has to be done once (at subgraph creation), not multiple times across potentially thousands of nodes.
Considerations:
- How do we propagate data types? It would be ideal to reuse the
tanhsubgraph above with eitherfloat16orfloat32inputs, without needing to create multiple graphs for each data type. Currently the input definition must be fully qualified, but it would be useful forgraphBuilder.input("input")to remain unresolved at subgraph creation time, then resolved atsubgraphusage time (meaning shape and type propagation is delayed until knowable). This impactsconstant()too which currently expects a concreteMLOperandDataType, rather than lazily accepting the type of anotherMLOperandor having a way to cast to the target type of another tensor (like ONNX CastLike). - How do we propagate input shapes? We should be able to reuse a subgraph with multiple input shapes, but currently
inputs require a concrete tensor description with known shapes, meaning these subgraphs would only work with input tensors of exactly the same shape. Would this interact with Support flexible input sizes #883? This affects constants too - would we want a constant-of-shape overload? - Should we pass a name string along to
buildSubgraphto aid the backend in verifying custom operator compatibility, rather than pure pattern matching? If so, what about backends that use different names for operators? Should it be a list of possible strings then (or is pure pattern matching better)?
(note this is different from #6, as this is more about subgraph composition from existing primitive operators rather than say interop with custom WebGPU shaders)
Additionally:
- should attributes be parameterizable?
- support optional inputs?
- should subgraphs include (optionally) names (or a list of names)?