-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add batch environment #146
base: master
Are you sure you want to change the base?
add batch environment #146
Conversation
Codecov Report
@@ Coverage Diff @@
## master #146 +/- ##
==========================================
- Coverage 78.21% 72.40% -5.81%
==========================================
Files 21 24 +3
Lines 2226 2533 +307
==========================================
+ Hits 1741 1834 +93
- Misses 485 699 +214
Continue to review full report at Codecov.
|
Added REPL-based playing. Here are some benchmarks for this environment with batch size 64 on a single thread without any parallelization. I'm sure the numbers can be improved. These are just preliminary benchmarks. Right now I am copying the Date: 2021_06_17_23_41_30 List of EnvironmentsBenchmarksSingleRoomUndirectedBatchRun uniformly random policy, NUM_ENVS = 64, NUM_RESETS = 100, STEPS_PER_RESET = 100, TOTAL_STEPS = 10000memory estimate: 118.94 MiB allocs estimate: 3256401 minimum time: 245.287 ms (1.38% GC) median time: 248.607 ms (2.08% GC) mean time: 250.885 ms (1.88% GC) maximum time: 277.825 ms (1.28% GC) samples: 20 evals/sample: 1 GridWorlds.ModuleSingleRoomUndirectedBatch.SingleRoomUndirectedBatch()memory estimate: 2.23 MiB allocs estimate: 928 minimum time: 1.481 ms (0.00% GC) median time: 1.559 ms (0.00% GC) mean time: 1.689 ms (3.84% GC) maximum time: 4.262 ms (30.46% GC) samples: 2952 evals/sample: 1 RLBase.reset!(env)memory estimate: 7.00 KiB allocs estimate: 64 minimum time: 6.581 μs (0.00% GC) median time: 7.057 μs (0.00% GC) mean time: 8.222 μs (1.48% GC) maximum time: 197.964 μs (93.81% GC) samples: 10000 evals/sample: 5 RLBase.state(env)memory estimate: 1.66 KiB allocs estimate: 2 minimum time: 196.419 ns (0.00% GC) median time: 259.854 ns (0.00% GC) mean time: 291.363 ns (9.01% GC) maximum time: 2.413 μs (56.48% GC) samples: 10000 evals/sample: 676 RLBase.action_space(env)memory estimate: 0 bytes allocs estimate: 0 minimum time: 2.533 ns (0.00% GC) median time: 2.797 ns (0.00% GC) mean time: 2.943 ns (0.00% GC) maximum time: 35.666 ns (0.00% GC) samples: 10000 evals/sample: 1000 RLBase.is_terminated(env)memory estimate: 128 bytes allocs estimate: 2 minimum time: 61.851 ns (0.00% GC) median time: 65.807 ns (0.00% GC) mean time: 76.281 ns (9.40% GC) maximum time: 3.087 μs (96.85% GC) samples: 10000 evals/sample: 975 RLBase.reward(env)memory estimate: 336 bytes allocs estimate: 1 minimum time: 56.887 ns (0.00% GC) median time: 62.510 ns (0.00% GC) mean time: 70.044 ns (5.47% GC) maximum time: 1.226 μs (92.14% GC) samples: 10000 evals/sample: 976 env(1)memory estimate: 10.00 KiB allocs estimate: 320 minimum time: 20.442 μs (0.00% GC) median time: 22.449 μs (0.00% GC) mean time: 25.575 μs (1.55% GC) maximum time: 2.016 ms (98.68% GC) samples: 10000 evals/sample: 1 env(2)memory estimate: 10.00 KiB allocs estimate: 320 minimum time: 20.467 μs (0.00% GC) median time: 22.522 μs (0.00% GC) mean time: 25.235 μs (1.59% GC) maximum time: 2.075 ms (98.64% GC) samples: 10000 evals/sample: 1 env(4)memory estimate: 10.00 KiB allocs estimate: 320 minimum time: 20.011 μs (0.00% GC) median time: 21.992 μs (0.00% GC) mean time: 24.767 μs (1.61% GC) maximum time: 2.046 ms (98.64% GC) samples: 10000 evals/sample: 1 env(3)memory estimate: 10.00 KiB allocs estimate: 320 minimum time: 20.017 μs (0.00% GC) median time: 22.013 μs (0.00% GC) mean time: 24.846 μs (1.60% GC) maximum time: 2.060 ms (98.60% GC) samples: 10000 evals/sample: 1 |
In my experience, you are right to worry about this. Returning a reference to the internal state of the environment will result in a lot of frustrating unexpected behavior. On the other hand, I don't have experience with "batch" environments, so I may be missing something important about how this will be used.
I am not sure that it is that inefficient. Copying memory is not too bad; the only thing that is really expensive is allocating memory on the heap (which happens when you create a new variable-sized array). If the compiler knows the size of the grid world and you use static arrays, I think it should be pretty efficient to return a copy. |
Thank you for your comments @zsunberg
From my understanding, this part shouldn't be too different for single vs. batch environments (I am imagining a simple case where all the state vectors are batched together and fed to a neural network as an input vs. a single state vector fed to the neural network as input).
In order to avoid this safely, we anyways need at least one memory copy operation. This should hold true even for batch environments.
Yes. So right now in the current implementation of all the single environments, fetching the state is along the following lines:
As you pointed out, it is relatively expensive to (unnecessarily) allocate new memory on the heap for each step of the game loop.
This requires only one memory allocation during initialization, and then one memory copy operation at each time step (which we inevitably need to avoid unintended mutations of internal state of environment).
Right. I think it might still be slightly better to pre-allocate memory for a mutable static array instead of creating a new static array at each step. I'm not totally sure about this one though. I think both of them might be pretty close for static arrays. More importantly, the number of elements in the state vector could go well above 100 elements even for small grid worlds, so using static arrays might not be a scalable solution, especially for larger sized grid worlds. Maybe the maximum useful size could be higher for static Another benefit of having something like @findmyway Please let us know your thoughts as well. |
Unfortunately no :( JuliaArrays/StaticArrays.jl#412 I did some benchmarks: World = falses(20, 20)
display(@benchmark SMatrix{20,20}($World)) # mean: 103 ns; max: 248 ns (0 alloc)
display(@benchmark MMatrix{20,20}($World)) # mean: 239 ns; max: 3384 ns (1 alloc)
display(@benchmark copy($World)) # mean: 69 ns; max: 2297 ns (2 alloc) It looks like the savings of keeping it in terms of bits rather than It seems like |
Thanks for @zsunberg 's comments. In summary, we are mainly discussing the following points here:
Re # 1, for Re # 2, based on @zsunberg 's benchmark result. It seems directly copying seems to be the fastest. Talking about how to implement the "batch" environment, this is still experimental so that we can leverage as much resource as possible to run multiple replicas at the same time. @Sid-Bhatia-0 could you make a summary of the benchmark result you listed above? I'm afraid it's too verbose, making others hard to understand what you want to express here. |
I see.
Right. This is an interface concern. I don't have an informed opinion on this yet. I haven't looked into CommonRLInterface much, but maybe the solution could be along the lines of "optional interfaces", as you mentioned.
Yes. You are right in that not all users may want to go with the defaults and some users may want to do custom things to the get the environment state, and may need access to the internal state directly. But making it the default behavior might cause more problems than it solves (for most users). Moreover isn't that what method overloading is for? If someone wants to retrieve any internals and do anything fancy with it, they could always access to the env struct and can overload the default
Correct me if I am missing something, but here is what I understand... From usability standpoint, for
Sure. And if one wants to use some other interface, they will anyways have to write the conversion. Compare the following:
and
Why would the first one be worse than the second one?
Wouldn't this cause confusion if left unspecified?
The benchmarks I mentioned are just |
Well, it's doable though...
This is just a taste of software design here. Personally I prefer
instead of
Yes, I think so. Let me add a warning in the docstring first.
Yes, please do. And it would be great to make it into a table so that we can easily compare different implementations later. |
Address the concern in JuliaReinforcementLearning/GridWorlds.jl#146 (comment)
I see. Thank you for the clarification. I'll think more on it later. Here's an issue to track it #148
Right. A table would be very nice to have. I'll see if I can fit them in a page. I'll do this is in a separate PR. Here's an issue to track it #147 |
Ok. So here are some benchmarks comparing optimized single-threaded CPU implementations of batch vs. non-batch for the For the SingleRoomUndirected (non-batch)
SingleRoomUndirectedBatch (
|
random_policy | reset | state | MOVE_UP | MOVE_DOWN | MOVE_LEFT | MOVE_RIGHT | action_space | is_terminated | reward |
---|---|---|---|---|---|---|---|---|---|
96 bytes 629.735 μs |
0 bytes 86.335 ns |
0 bytes 2.914 ns |
0 bytes 43.705 ns |
0 bytes 43.630 ns |
0 bytes 44.786 ns |
0 bytes 44.370 ns |
0 bytes 2.426 ns |
0 bytes 2.432 ns |
0 bytes 2.799 ns |
SingleRoomUndirectedBatch (NUM_ENVS = 1024
)
random_policy | reset | state | MOVE_UP | MOVE_DOWN | MOVE_LEFT | MOVE_RIGHT | action_space | is_terminated | reward |
---|---|---|---|---|---|---|---|---|---|
8.12 KiB 383.722 ms |
0 bytes 111.034 μs |
0 bytes 1.879 ns |
0 bytes 15.101 μs |
0 bytes 14.594 μs |
0 bytes 14.129 μs |
0 bytes 14.021 μs |
0 bytes 1.854 ns |
0 bytes 2.200 ns |
0 bytes 1.887 ns |
The Batch version with NUM_ENVS = 1
is worse than the non-batch version, which was expected.
The scaling from non-batch to batch is roughly linear (slightly worse) in NUM_ENVS
for large NUM_ENVS
, which is reasonable. Both batch and non-batch versions are fairly optimized and so we don't observe any gains on a single CPU thread. Right now we are just benchmarking the RLBase
operations on these environments without any other overheads, but if we actually benchmark end-to-end using an actual algorithm, we should ideally see some benefits of batch version since it uses struct of arrays, even with a single CPU thread.
Note: The allocation during random_policy for batch version is single time allocation for storing the batch of actions generated at each step.
Hi @Sid-Bhatia-0 , After thinking for a while, I'm wondering maybe it is possible to have one unified codebase to implement the batch environments. First, let's assume the game state and game spec are separated. Basically, a game state may look like this: struct SomeSpecificaGameState
board::AbstractArray{Bool}
agent_position::CartesianIndex
reward::Float32
end Then, ideally we can use The only thing I haven't figured out is how to update some scalar values under the Based on the example in StructArrays.jl, the whole element is updated with the |
Hmm.. It'll take me a while before I can think more deeply about this and share an informed opinion. As discussed in our last call, let us put the batch environments on hold for now. I will be focusing on getting all the existing single versions working perfectly (resolving all non-batch issues) before JuliaCon. |
SingleRoomUndirectedBatch
.This is a work in progress. Several things need to be added: