Description
Goal
Improve the interactions between ReinforcementLearning.jl and other ecosystems in Julia.
Why is it important?
In the early days of developing this package, the main goal is to reproduce some popular (deep) RL algorithms. It's still important to keep adding new emerging algorithms into this package. But as an engineer, I always think the higher impact is achieved only when users really apply those powerful RL algorithms to the problems they are interested in. In recent years, many important packages across different domains were developed in Julia and the whole ecosystem improved a lot. Although the interfaces defined in this package are loose and flexible, people are still unsure how to use this package due to lacking concrete examples. Adding more examples and removing some restricted assumptions will greatly encourage more people to try this package. On the other hand, doing so will also improve the quality of this package.
Potential breaking changes
The most important change would be decoupling the training data generation and policy optimization. The state is assumed to be a tensor by default in many cases. This is the main blocking issue when interacting with many other packages. Besides, the async training pipeline will not only improve the performance of existing algorithms on a single node but also provide the foundation of large scale training in future releases (possibly in v0.12)
Key issues to be addressed
Following are some of the existing issues on the top of my mind. Please raise new ones if you wish to be addressed in the next release.
Environments
- Add a dedicated multi-dimensional space type #268
- CI fails with Julia@v1.7 #572
Stil no luck to address this issue, so I have to remove OpenSpiel related part in the next release.enable OpenSpiel #691 - Demonstrate that environments can be implemented on not only CPU but also GPU add CUDA accelerated Env #121
- How to display/render AtariEnv? #546
- Take a look at https://github.com/dojo-sim/Dojo.jl
- Take a look at https://github.com/Lyceum/MuJoCo.jl
- More examples with DifferentialEquations.jl
- https://github.com/deepmind/android_env Should be OK through PyCall.jl. Need some examples.
- Look into https://github.com/corail-research/SeaPearl.jl and see if there could be some improvements.
Refactor Existing Policies
- BasicDQN
-
JuliaRL_BC_CartPole
-
JuliaRL_DQN_CartPole
AddJuliaRL_DQN_CartPole
#650 -
JuliaRL_PrioritizedDQN_CartPole
add PrioritizedDQN #698 -
JuliaRL_Rainbow_CartPole
add rainbow #724 -
JuliaRL_QRDQN_CartPole
add QRDQN #699 -
JuliaRL_REMDQN_CartPole
add REMDQN #708 -
JuliaRL_IQN_CartPole
add IQN #710 -
JuliaRL_VMPO_CartPole
-
JuliaRL_VPG_CartPole
add VPG #733 -
JuliaRL_BasicDQN_MountainCar
-
JuliaRL_DQN_MountainCar
-
JuliaRL_A2C_CartPole
-
JuliaRL_A2CGAE_CartPole
-
JuliaRL_PPO_CartPole
-
JuliaRL_MAC_CartPole
-
JuliaRL_DDPG_Pendulum
-
JuliaRL_SAC_Pendulum
-
JuliaRL_TD3_Pendulum
-
JuliaRL_PPO_Pendulum
-
JuliaRL_BasicDQN_SingleRoomUndirected
Add New Policies
-
Question: Can ReinforcementLearning.jl handle Partially Observed Markov Processes (POMDPs)? #608
-
Rename some functions to help beginners navigate source code #326
-
Improve the code structure and docs on general utils when defining a network Unify common network architectures and patterns #139
-
Add alternatives to Flux.jl Experimental support of Torch.jl #136
-
Model based reinforcement learning #262 and WIP: PETS algorithm from facebook/mbrl #531
-
Gain in VPGPolicy does not account for terminal states? #578
-
Integrate CFR related algos at https://github.com/WhiffleFish/CounterfactualRegret.jl ?
-
Combine transformers and RL #392 Borrow some ideas from https://github.com/facebookresearch/salina
Training pipeline
- Support async training Asynchronous Methods for Deep Reinforcement Learning #142
- Recurrent Models #144
- non-episodic environments Add a way to handle non-episodic environments #613 changes to policies may also be needed.
Documentation
- Missing docs for how to implement a new algorithm #582
- Missing docs for
TDLearner
#580 - Explain
MultiThreadEnv
in detail.MultiThreadEnv
with custom (continuous) action spaces fails #596
Utils
- Visualization. Leverage Term.jl based on the suggestion here
- Support Tables.jl and PrettyTables.jl for Trajectories #232
- Observation. Integrate OpenTelemetry.jl to provide a more unified approach for recording.
- Setup CI to generate the Docker image
Timeline
I'm not sure I can fix them all. But at least I'll take a deep look into them and then tag a new release at the end of this quarter (around the end of June 2022).