Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate parallel environment interactions with GPU #138

Open
findmyway opened this issue Apr 16, 2021 · 6 comments
Open

Accelerate parallel environment interactions with GPU #138

findmyway opened this issue Apr 16, 2021 · 6 comments

Comments

@findmyway
Copy link
Member

No description provided.

@Sid-Bhatia-0
Copy link
Member

@findmyway This would be my first time working with GPU programming (or any form of concurrent programming for that matter). So I have a few question:

  1. I've heard that there is a cost incurred upon moving memory from CPU to GPU. I think the present CPU performance of the environments (see benchmark.md) appears to be quite good relative to the time it would take to train a nerual network. Can you please explain to me how do we decide if using a GPU is worth it?
  2. In relation to 1., how do we decide upon incorporating multithreading on CPU vs using a GPU for acceleration?
  3. If we are going to use a GPU, what exactly does it entail? Does this mean storing the BitArray{3} on the GPU so that the state can directly be taken to the neural network weight stored on the GPU? Where does environment logic get executed - CPU or GPU?

@findmyway
Copy link
Member Author

Actually Q3 answers Q1, we don't need to transfer data between CPU and GPU in most cases.

Does this mean storing the BitArray{3} on the GPU so that the state can directly be taken to the neural network weight stored on the GPU?

Yes

Where does environment logic get executed - CPU or GPU?

GPU

@Sid-Bhatia-0
Copy link
Member

Sid-Bhatia-0 commented Apr 16, 2021

I have thought more deeply about it.

From what I understand, if we are to use the GPU, then the env instance would sit on the GPU and all environment related computations will happen there so that the neural network can relatively easily access the state.

I want to know if it would be worth doing the env logic like taking actions (that doesn't have much parallelism in it) on the GPU vs. doing everything on the CPU and moving data between CPU and GPU at each step.
Let
n = total number of steps required to be executed in env in order to train a policy from scratch
x = avg. cost of env logic per step on GPU
y = avg. cost of env logic per step on CPU (potentially implemented in multithreaded fashion, which would give even more performance than presently in benchmark.md) + avg. cost of moving state from CPU to GPU + avg. cost of moving computed action from GPU to CPU to be executed in the env.
z = avg. total cost of fully training a policy in env on GPU from scratch excluding env logic.

Ideally, we would want to use the GPU if:
(n*y)/z > (n*x)/z
If LHS is significantly greater than RHS, then we can justify this feature. Correct me if I am wrong, but it is not obvious to me that this equation holds.

Even more importantly, if (n*x)/z << 1, that is, the total cost of env logic on CPU is much less than total cost of training on GPU excluding env logic, then we won't be gaining much by incorporating GPU support. My initial hunch is that this would hold true because the env logic is quite simple and should cost way less than training a neural network. I have left out reset!, assuming that it will get ammortized over the total number of steps and isn't significantly costly overall.

What do you think?

@findmyway
Copy link
Member Author

  1. You may have underestimated the cost of moving data between CPU and GPU.
  2. I think you are only talking about ONE environment here. But what I mean here is rolling out multiple environment instances simultaneously (usually hundreds of environments)

@findmyway
Copy link
Member Author

It seems someone has already done something related:

https://discourse.julialang.org/t/alphagpu-an-alphazero-implementation-wholly-on-gpu/60030

@Sid-Bhatia-0
Copy link
Member

Thanks for pointing it out!
By the way, I won't be able to work on GPUization anytime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants