Description of "-t N" option for server is inaccurate #7355

tigran123 · 2024-05-17T23:12:10Z

The documentation for the server says that -t N option is not used if model layers are offloaded to GPU. However, when some layers are offloaded to GPU I still see the load on CPU grow to 1200% with -t 12 option during inference, but the load on GPU is very small and happens in short bursts up to 10% or so. However, if the model is so small thar ALL layers can be offloaded to GPU, then the load on CPU does not exceed 100%, but the load on GPU attains 100%.

So, my guess is that the documentation is supposed to say "-t N option not used when ALL layers are offloaded to GPU", right?

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2024-05-18T08:49:59Z

You're right, the documentation is wrong, see #7362 .

tigran123 added the bug-unconfirmed label May 17, 2024

JohannesGaessler mentioned this issue May 18, 2024

server: correct --threads documentation [no ci] #7362

Merged

JohannesGaessler closed this as completed in #7362 May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description of "-t N" option for server is inaccurate #7355

Description of "-t N" option for server is inaccurate #7355

tigran123 commented May 17, 2024

JohannesGaessler commented May 18, 2024

Description of "-t N" option for server is inaccurate #7355

Description of "-t N" option for server is inaccurate #7355

Comments

tigran123 commented May 17, 2024

JohannesGaessler commented May 18, 2024