Description
This issue is a suggestion based on our findings.
Some of the examples in Level Zero are using the UNCACHED
Flag : https://spec.oneapi.io/level-zero/latest/core/api.html#_CPPv438ZE_DEVICE_MEM_ALLOC_FLAG_BIAS_UNCACHED
For example here:
https://github.com/intel/compute-runtime/blob/master/level_zero/core/test/black_box_tests/zello_timestamp.cpp#L102
In TornadoVM we have run a lot of experiments with CACHED
and UNCACHED
and we saw that the CACHED
version is up to 4x faster than the UNCACHED
. As I understand, the UNCACHED
flag can be used when buffers are streamed once, and not reused, so there is space in GPU's cache for other reusable buffers. Unfortunately, the Level Zero documentation does not warn about this. From our experience, this is very "error prune" since we were analyzing the number of threads and block of threads deployed, rather than how memory was allocated.
Does it make sense to add some documentation in the Level Zero examples to include this information? As well as in which situations developers may use the UNCACHED
vs CACHED
flag?
BTW, I am not sure if this is the right repo to file this issue or I should also open one in Level Zero.