You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #4228
This diff introduces a metric to GPUInfo that calculates the cacheline size of the buffer data pathway. In this experiment, all threads read from the cache with a varying stride. Reading two values from the same cacheline is cheap because the whole line is fetched as a block, regardless of which data we actually want. By varying the separation between the addresses of these two values, there will be a point where the shader will be forced to fetch two separate cachelines, which will have an effect in latency that we can detect.
[This article](https://igoro.com/archive/gallery-of-processor-cache-effects/) has more information on the topic.
The experiment first calculates the number of iterations (NITER) that would take the lowest stride to run in 1000 microseconds. All experiments will then run this number of times. This is to have a timing baseline and avoid timing errors.
Each run of the shader fetches the two values from different points in memory. The shader also has a seemingly redundant variable `zero` that will force the compiler to avoid optimizing the for loop.
The experiment will look like this:
{F1754670481}
Differential Revision: D59649561
0 commit comments