Closed
Description
And comment: ggml-org/llama.cpp#1507 (comment)
I guess we can extend ggml to be able to choose work chunk distribution method - either at compile time, or via a context parameter. We can factor out the range selections from the ggml forward implementations to make implementation more concise and extensible in the future
Another thing to be investigated is the usage of sched_yield()
and potentially making it user configurable: