Skip to content

[webgpu]Proposal for tiled texture in WebGPU backend #3132

Closed
@axinging

Description

@axinging

Problem

On TFJS benchmark, for conv2d and maxPool, WebGL has better performance than WebGPU.

One of the possible reason for this is, WebGL's texture has better spatial locality than WebGPU's buffer, especially for ops like conv2d and maxpool.

Proposals

Investigate tiled texture, and its performance difference compared with buffer in WebGPU backend in TFJS. If this can be proved better for some 2D (or higher) input, support tiled texture.

Tiled texture vs buffer

Think about 2d convolution, with 4x16 input matrix and 3x3 filter matrix, GPU cache line size = 16.
So for every output element (padding = 0, stride = 1, dialation = 1), shader need to access 9 elements from the input matrix.

When the input matrix is stored in tiled texture (texture tiling , this is named storage texture in WebGPU), all these 9 elements are near to each other in spatial, then the whole 9 elements from the input matrix may possibly be cached in one cache line, this means less cache miss and better performance.

The tiled texture access is depicted as below:

0 1 2 3 4 ...
16 17 18 19 20 ...
32 33 34 35 36 ...
48 49 50 51 52 ...

When the input matrix is stored in buffer, all data is stored in a linear way, so when we tried to access all these 9 data, we need three cache lines. This indicates more cache miss and poor performance.

The buffer access is depicted as below:

0 1 2 3 4 ... 16 17 18 19 20 ... 32 33 34 35 36 ... 48 49 50 51 52 ...

TODO

  1. Texture may benefits some 2D (including higher dimensional) inputs, like image, video.
    But for some case like audio, text, it's unclear, and need further investigation.

  2. We need some prototype to prove this.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions