Next Release Plan (v0.11)

# Goal

Improve the interactions between ReinforcementLearning.jl and other ecosystems in Julia.

## Why is it important?

In the early days of developing this package, the main goal is to reproduce some popular (deep) RL algorithms. It's still important to keep adding new emerging algorithms into this package. But as an engineer, I always think the higher impact is achieved only when users really **apply** those powerful RL algorithms to the problems they are interested in. In recent years, many important packages across different domains were developed in Julia and the whole ecosystem improved a lot. Although the interfaces defined in this package are loose and flexible, people are still unsure how to use this package due to lacking concrete examples. Adding more examples and removing some restricted assumptions will greatly encourage more people to try this package. On the other hand, doing so will also improve the quality of this package.

## Potential breaking changes

The most important change would be decoupling the training data generation and policy optimization. The state is assumed to be a tensor by default in many cases. This is the main blocking issue when interacting with many other packages. Besides, the async training pipeline will not only improve the performance of existing algorithms on a single node but also provide the foundation of large scale training in future releases (possibly in v0.12)

## Key issues to be addressed

Following are some of the existing issues on the top of my mind. Please raise new ones if you wish to be addressed in the next release.

### Environments

- [x] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/268
- [x] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/572 ~~Stil no luck to address this issue, so I have to remove OpenSpiel related part in the next release.~~ https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/691 
- [ ] Demonstrate that environments can be implemented on not only CPU but also GPU   https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/121
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/546
- [ ] Take a look at https://github.com/dojo-sim/Dojo.jl
- [ ] Take a look at https://github.com/Lyceum/MuJoCo.jl 
- [ ] More examples with DifferentialEquations.jl
- [ ] https://github.com/deepmind/android_env Should be OK through PyCall.jl. Need some examples.
- [ ] Look into https://github.com/corail-research/SeaPearl.jl and see if there could be some improvements.

### Refactor Existing Policies

- [x] BasicDQN
- [ ] `JuliaRL_BC_CartPole`
- [x] `JuliaRL_DQN_CartPole` https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/650
- [x] `JuliaRL_PrioritizedDQN_CartPole` #698
- [x] `JuliaRL_Rainbow_CartPole` #724 
- [x] `JuliaRL_QRDQN_CartPole` #699
- [x] `JuliaRL_REMDQN_CartPole` #708
- [x] `JuliaRL_IQN_CartPole` #710
- [ ] `JuliaRL_VMPO_CartPole`
- [x] `JuliaRL_VPG_CartPole` #733 
- [ ] `JuliaRL_BasicDQN_MountainCar`
- [ ] `JuliaRL_DQN_MountainCar`
- [ ] `JuliaRL_A2C_CartPole`
- [ ] `JuliaRL_A2CGAE_CartPole`
- [ ] `JuliaRL_PPO_CartPole`
- [ ] `JuliaRL_MAC_CartPole`
- [ ] `JuliaRL_DDPG_Pendulum`
- [ ] `JuliaRL_SAC_Pendulum`
- [ ] `JuliaRL_TD3_Pendulum`
- [ ] `JuliaRL_PPO_Pendulum`           
- [ ] `JuliaRL_BasicDQN_SingleRoomUndirected`



### Add New Policies

- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/608
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/326
- [ ] Improve the code structure and docs on general utils when defining a network https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/139
- [ ] Add alternatives to Flux.jl https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/136
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/193
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/206
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/250
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/262 and https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/531
- [ ] Revisit https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/347
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/578

- [ ] Integrate CFR related algos at https://github.com/WhiffleFish/CounterfactualRegret.jl ?
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/392 Borrow some ideas from https://github.com/facebookresearch/salina

### Training pipeline

- [ ] Support async training https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/142
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/144
- [ ] non-episodic environments https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/613 changes to policies may also be needed.

### Documentation

- [x] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/582
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/580
- [ ] Explain `MultiThreadEnv` in detail. https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/596

### Utils

- [ ] Visualization. Leverage [Term.jl](https://github.com/FedeClaudi/Term.jl) based on the suggestion [here](https://discourse.julialang.org/t/announcing-term-jl/78170/11?u=findmyway)
- [ ] https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/232
- [ ] Observation. Integrate [OpenTelemetry.jl](https://github.com/oolong-dev/OpenTelemetry.jl) to provide a more unified approach for recording.
- [ ] Setup CI to generate the Docker image

## Timeline

I'm not sure I can fix them all. But at least I'll take a deep look into them and then tag a new release at the end of this quarter (around the end of June 2022).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Next Release Plan (v0.11) #614

Goal

Why is it important?

Potential breaking changes

Key issues to be addressed

Environments

Refactor Existing Policies

Add New Policies

Training pipeline

Documentation

Utils

Timeline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Next Release Plan (v0.11) #614

Description

Goal

Why is it important?

Potential breaking changes

Key issues to be addressed

Environments

Refactor Existing Policies

Add New Policies

Training pipeline

Documentation

Utils

Timeline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions