Skip to content

Latest commit

 

History

History
45 lines (24 loc) · 3.63 KB

sim2real.md

File metadata and controls

45 lines (24 loc) · 3.63 KB

Not finalized yet

Training policies that transfer to the real robot (sim2real)

We want to train policies that transfer well to the real robot. This is the sim2real problem. It's a hard problem, especially for us since we are using cheap servomotors that are hard to model and not overly powerful.

Below, I'll roughly explain the steps we went through to get there, but you won't have to do everything again yourself since we provide the results of each process.

Make an accurate model of the robot (URDF/MJCF)

Robot structure

In the Onhape document, we specify the material of each part. But to be more accurate, because we print with infill we override the mass of the parts with the (pretty accurate) estimation of the slicer.

We use onshape-to-robot to export URDF/MJCF descriptions. For MJX, we have to make a lightweight model, see our config.json.

This gives us a MJCF (Mujoco format) description of the robot which describes the masses and moments of inertia of the full robot.

Motors

Another very important part of having an accurate model is modeling the motors's behavior. We use BAM for that. You don't have to go through the identification process yourself, we provide the results here

It's critical that the simulator simulates the motors accurately, because we will train a policy (a neural network) to output motor positions based on sensory inputs (motors positions/speeds, imu and feet sensors). If the motors behave differently in the simulation than in the real world, the policy won't work, or at worst, produce chaotic movements.

BAM allows us to export the main identified parameters to mujoco units (using bam.to_mujoco). These values are the ones we set in out mjcf model for the actuators and joints properties.

  • damping
  • kp
  • frictionloss
  • armature
  • forcerange

Training policies

We use our own mujoco playground based framework, Open Duck Playground

In the joystick env, you can try to enable/disable different rewards, write your own, play with the weights, noise, randomization etc.

We obtained good results by implementing the imitation reward described by Disney in their BDX paper.

To use this reward, we need reference motion. We made this repo to generate such motion using a parametric walk engine. Following the instructions there, you can generate a polynomial_coefficients.pkl file which contains the reference motions. There is already such a file in the playground repo under the data/ directory.

Once your policy is trained, you can try to run it on the real robot using this script in the runtime repo. Make sure you completed all the steps in the checklist before running this.