A simplified scalable distributed RL training framework, supporting asynchronous RL training with arbitrary number of rollout workers and replay memory servers.
Implemented with:
- Python Flask server (to be replaced by brpc)
- protobuf Python interface
- Linux shared memory mechanism
- PyTorch
- OpenAI gym
Still on developing...
Start training:
# NOTE: Use `bash` to activate the scripts instead of `sh`, which is a link of dash and may have potential bugs
bash start.shstart.sh will create several folders under the current working directory, with their names starting with either running_rollout_ or running_worker_, which correspond to mempool server processes and worker processes, respectively.
Terminate training:
bash kill.shClean all temporary files:
bash clean.shWarning: this will delete all log files and models checkpoints at once, back-up if necessary.
Global variables are set in two files:
distributed.config: number of worker processes in training, ports of the replay memory serversglobal_variables.py: all other relative variables in training
My preliminary experimental results on a CPU machine show that , when the number of mempool processes and worker processes reaches 1:4, the writing speed and reading speed of the memory pool can be roughly balanced.