Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Low CPU and GPU utilization when running in the gpu mode. #15

Open
wesley-stone opened this issue Feb 14, 2021 · 1 comment
Open

Low CPU and GPU utilization when running in the gpu mode. #15

wesley-stone opened this issue Feb 14, 2021 · 1 comment

Comments

@wesley-stone
Copy link

Here is my script:
python run.py --adhoc --cfg conf/c02_selfplay/liars_sp.yaml env.num_dice=1 env.num_faces=4 env.subgame_params.use_cfr=true selfplay.cpu_gen_threads=0 selfplay.threads_per_gpu=16

My computer configuration is 22080ti + 2cpu with 24 thread per cpu.
The log seems quite normal, just collecting experience and training. However, I find that the utilization of cpu is only 5% and the gpu is 6%. Is this normal? And could you tell me how can I fasten the process? Thanks a lot!

@Drazcmd
Copy link

Drazcmd commented Jan 31, 2024

EDIT: ignore the first part of this, I misread 2 2080ti as "2080ti", i.e. 1 x 2080ti (sorry for the confusion!). That said, the other part about reserving one of them the model might be relevant?


TLDR: Based on what's in the README, + some of the closed issues and parts of cfvpy/selfplay.py, I think the intended use case for non-zero selfplay.threads_per_gpu and zero selfplay.cpu_gen_threads is actually when you have at least TWO GPUs.

(And, if you only have one GPU, that's what the CPU-based data generation, i.e. setting selfplay.cpu_gen_threads=60 in the example, is for)

Having said all that, some comments by the authors do seem to suggest that it's probably ok to modify the relevant selfplay.py code to also use GPU 0 for data generation -> https://github.com/facebookresearch/rebel/blob/master/cfvpy/selfplay.py#L193. So might be worth trying that out perhaps?


Longer explanation: I'm not 100% certain about any of this (I'm a complete newbie to CUDA / cuDNN), but after reading through some of the closed issues + the code, I think:

  • the 'gpu mode' they talk about is currently intended for two or more gpus. Specifically, if I understand their comments correctly, the code path you enter in that situation uses the 0th GPU for the model, then uses all remaining GPUs for the expensive "data generation" part that's actually running CFR. That said, based on their comments, it's also pretty easy to change the code to just NOT do that? (note: I'm not sure of the consequences for doing so)

  • the 'cpu mode' (setting cpu threads) is meant for when you have exactly one gpu - with the result being that it reserves your entire GPU for the model (maybe? not sure!) - and then starts using your CPU for data generation. But since the data generation is the expensive part, this will probably be very slow.

  • (As for if you don't have a CUDA-compatible GPU, I think might just be that you can't run this at all? Again, not certain!)


On a side note, for anyone else poking around this stuff in 2024, here's a couple quick things I've noticed:

  • the current selfplay.py file has a weird assert that I believe causes it to crash if there's not >= 2 CUDA compatible GPUs on the system, regardless of if you set the selfplay.cpu_gen_threads stuff. Pretty sure that's a bug / it should only be asserting there's >=1 GPU at that spot (and instead asserting >= 2 only when in the "gpu" code path below)?

  • a bunch of the install requirements are outdated / there's some strange bugs. Working with a relatively fresh ubuntu 18.04 pro install, I had to first

      1. install the last cuDNN 7.X (cuDNN 8.x had problems I think?) and a corresponding version of CUDA (ensuring CUDA specifically was in /usr/local/cuda since otherwise one of the pytorch 14 dependencies couldn't find it),
      1. use specifically python 3.7
      1. temporarily remove all of the test files from liars_dice (since there was some weird issue with types being incompatible?... still need to actually figure out what exactly was going on there)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants