Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a compute shader sample #1

Merged
merged 4 commits into from
Feb 20, 2024

Conversation

goodartistscopy
Copy link
Contributor

The sample generates an animated gif of the evolution of a Game of Life automaton. The sample demonstrates:

  • The use of a compute shader that writes into a storage texture
  • The classic "ping-pong" technique between two textures for iterative computation
  • The use of local workgroup memory (optional: see the shader code for details)

@mpizenberg
Copy link
Owner

Thanks a lot for this example. I have a few questions.

  1. Regarding textures, isn't there a setting that enables automatic texture wrapping for coordinates that fall outside of texture limits (<0, >width, ...)?
  2. The staging buffer is only used for the optimized shader right?
  3. Since the execution time is dominated by the gif creation, we can't really experience the local cache optimization. How to make it matter more?

@goodartistscopy
Copy link
Contributor Author

1. Regarding textures, isn't there a setting that enables automatic texture wrapping for coordinates that fall outside of texture limits (<0, >width, ...)?

There is, when the texture is accessed with a sampler using one of the textureSample() functions. You need to create a GPUSampler, and bind it like any other resource. The default "address mode" of a sampler is indeed "repeat" which would implement the donut coordinates system.
Here I'm just accessing the texels without interpolation, so I figured using a sampler would be a waste.

2. The staging buffer is only used for the optimized shader right?

The staging buffer is used as a CPU mappable copy of the output storage texture (if a texture is used as storage texture it can't also be mapped for reading on CPU). So it's used in both paths.

3. Since the execution time is dominated by the gif creation, we can't really experience the local cache optimization. How to make it matter more?

Right. It's mostly meant to illustrate how to use the local shared memory as a user-managed cache. I'm not sure the shader does enough repeated accesses to the input buffer to really matter (reads are amplified by 9, besides they're very regular).

Another factor is that modern GPUs have caches, so the "slow" implementation might actually be as fast as the other one. It would be interesting to benchmark, using the "timer query" feature to measure the time spent in the compute shader.

Where it becomes important is if the threads do many incoherent reads and/or writes. Then a custom managed scratchpad may more decisively beat the general purpose caches of the memory sub-system.
So the answer to "How to make it matter more?" would be "Use a more complex workload" :) (a sort algorithm may be a good candidate).

@mpizenberg
Copy link
Owner

Here I'm just accessing the texels without interpolation, so I figured using a sampler would be a waste.

Ah yes, ok I understand.

a sort algorithm may be a good candidate

XD ok ok

@mpizenberg mpizenberg merged commit ca29e70 into mpizenberg:main Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants