Description
In TFJS, there are already some fused ops, like fusedConv2d, fusedDepthwiseConv2d, fusedMatMul, which can greatly improve the performance. However, there are many other patterns which are frequently used in many models, but not fused. We'd like use this bug to track all such kind of patterns to see if there are any possibilities to fuse them in TFJS for better performance.
We raise this issue is that TFJS has included webgpu backend. It's more powerful than webgl. Due to tfjs webgpu is based on compute shader rather than fragment shader. It's more flexible to randomly access any position and write to any position. It provides convenience/possibilities to fuse any ops combination. Even a new fused pattern is hard to implement in some backends. For those backends, it's still easy to break down the fused ops into individual ops to execute them.