A Supervised Approach to Reinforcement Learning

Deep reinforcement learning (RL) often relies on simulators as abstract oracles to model interactions within complex environments. While differentiable simulators have recently emerged for multi-body robotic systems, they remain underutilized, despite their potential to provide richer information. This underutilization, coupled with the high computational cost of exploration-exploitation in high-dimensional state spaces, limits the practical application of RL in real-world. We propose a method that integrates learning with differentiable simulators to enhance the efficiency of exploration-exploitation. Our approach learns value functions, state trajectories, and control policies from locally optimal runs of a model-based trajectory optimizer. The learned value function acts as a proxy to shorten the preview horizon, while approximated state and control policies guide the trajectory optimization. We benchmark our algorithm on three classical control problems and a torque-controlled 7 degree-of-freedom robot manipulator arm, demonstrating faster convergence and a more efficient symbiotic relationship between learning and simulation for end-to-end training of complex, poly-articulated systems.

The source code is released under the MIT license.

Authors: Amit Parag

The algorithm is implemented on the Kuka arm. The goal is to reach for the static target.

Required Packages:

The Unsual Suspects

PyTorch
Crocoddyl
Pinocchio
example-robot-data
Gepetto Gui (optional for animating the learned policy in GUI)

The Usual Suspects

Numpy
tqdm
Seaborn
Matplotlib
PyYaml
jupyter (optional for notebooks)

How to run the experiment

Clone the repo

git clone https://gitlab.laas.fr/aparag/kuka-arm-dpvp

Change directory to src

cd dpvp/src

Look at the exp.yml file for experiment params.

Then run the main.py file

python3 main.py

The trained NN will be saved in results/exp_

The directory config/robot_properties_kuka contains URDF and meshes information of the robot, and config/ocp_params contains sets of robot parameters describing the OCP for Crocoddyl. The OCP itself is setup in utils/ddp.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
results		results
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Supervised Approach to Reinforcement Learning

Required Packages:

How to run the experiment

About

Uh oh!

Releases

Packages

Languages

License

amitparag/DpVp

Folders and files

Latest commit

History

Repository files navigation

A Supervised Approach to Reinforcement Learning

Required Packages:

How to run the experiment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages