[upgrade] unified interface and a easier example

* embed all function and new concepts into legged_robot for base implementation * add namedarraytuple concept borrowed from astooke/rlpyt * use namedarraytuple in rollout storage and summarize a minibatch object * add `rollout_file` concept to store rollout data in files and demonstrations * add state estimator module/algo implementation * complete rewrite `play.py` example * rename `actions_scaled_torque_clipped` to `actions_scaled_clipped` * add example onboard codes for deploying on Go2.
ZiwenZhuang · Jul 19, 2024 · 1ffd6d7 · 1ffd6d7
1 parent 96317ac
commit 1ffd6d7
Show file tree

Hide file tree

Showing 108 changed files with 47,651 additions and 3,067 deletions.
diff --git a/README.md b/README.md
@@ -16,27 +16,31 @@ Conference on Robot Learning (CoRL) 2023, **Oral**, **Best Systems Paper Award F
 
 ## Repository Structure ##
 * `legged_gym`: contains the isaacgym environment and config files.
-    - `legged_gym/legged_gym/envs/a1/`: contains all the training config files.
+    - `legged_gym/legged_gym/envs/{robot}/`: contains all the training config files for a specific robot
     - `legged_gym/legged_gym/envs/base/`: contains all the environment implementation.
     - `legged_gym/legged_gym/utils/terrain/`: contains the terrain generation code.
 * `rsl_rl`: contains the network module and algorithm implementation. You can copy this folder directly to your robot.
     - `rsl_rl/rsl_rl/algorithms/`: contains the algorithm implementation.
     - `rsl_rl/rsl_rl/modules/`: contains the network module implementation.
 
 ## Training in Simulation ##
-To install and run the code for training A1 in simulation, please clone this repository and follow the instructions in  [legged_gym/README.md](legged_gym/README.md).
+To install and run the code for training A1/Go2 in simulation, please clone this repository and follow the instructions in  [legged_gym/README.md](legged_gym/README.md).
 
 ## Hardware Deployment ##
-To deploy the trained model on your real robot, please follow the instructions in [Deploy.md](Deploy.md).
+To deploy the trained model on your unitree Go1 robot, please follow the instructions in [Deploy-Go1.md](onboard_codes/Deploy-Go1.md) for deploying on the Unittree Go1 robot.
+
+To deploy the trained model on your unitree Go2 robot, please follow the instructions in [Deploy-Go2.md](onboard_codes/Deploy-Go2.md) for deploying on the Unittree Go2 robot.
 
 
 ## Trouble Shooting ##
 If you cannot run the distillation part or all graphics computing goes to GPU 0 dispite you have multiple GPUs and have set the CUDA_VISIBLE_DEVICES, please use docker to isolate each GPU.
 
-## To Do (will be done before Nov 2023) ##
-- [x] Go1 training configuration (not from scratch)
+## To Do ##
+- [x] Go1 training configuration (does not guarantee the same performance as the paper)
 - [ ] A1 deployment code
 - [x] Go1 deployment code
+- [x] Go2 training configuration example (does not guarantee the same performance as the paper)
+- [x] Go2 deployment code example
 
 ## Citation ##
 If you find this project helpful to your research, please consider cite us! This is really important to us. 

diff --git a/legged_gym/README.md b/legged_gym/README.md
@@ -19,6 +19,53 @@ This is the tutorial for training the skill policy and distilling the parkour po
 ## Usage ##
 ***Always run your script in the root path of this legged_gym folder (which contains a `setup.py` file).***
 
+### Go2 example (newer and simplier, does not guarantee performance) ###
+
+1. Train a walking policy for planar locomotion
+    ```python
+    python legged_gym/scripts/train.py --headless --task go2
+    ```
+    Training logs will be saved in `logs/rough_go2`.
+
+2. Train the parkour policy. In this example, we use **scandot** for terrain perception, so we remove the **crawl** skill.
+
+    - Update the `"{Your trained walking model directory}"` value in the config file `legged_gym/legged_gym/envs/go2/go2_field_config.py` with the trained walking policy folder name.
+
+    - Run ```python legged_gym/scripts/train.py --headless --task go2_field```
+
+    The training logs will be saved in `logs/field_go2`.
+
+3. Distill the parkour policy
+
+    - Update the following literals in the config file `legged_gym/legged_gym/envs/go2/go2_distill_config.py`
+
+        - `"{Your trained oracle parkour model folder}"`: The oracle parkour policy folder name in the last step.
+
+        - `"{The latest model filename in the directory}"`: The model file name in the oracle parkour policy folder in the last step.
+
+        - `"{A temporary directory to store collected trajectory}"`: A temporary directory to store the collected trajectory data.
+
+    - Calibrate your depth camera extrinsic pose and update the `position` and `rotation` field in `sensor.forward_camera` class
+
+    - Run distillation process (choose one among the two options)
+
+        1. Run the distillation process in a single process if you believe you have a powerful GPU than Nvidia RTX 3090.
+
+            Set `multi_process_` to `False` in the config file.
+
+            Run ```python legged_gym/scripts/train.py --headless --task go2_distill```
+
+        2. Run the distillation process in multiple processes with multiple GPUs
+
+            Run ```python legged_gym/scripts/train.py --headless --task go2_distill```
+
+            Find the log directory generated by the training process when prompted waiting. (e.g. **Jul18_07-22-08_Go2_10skills_fromJul16_07-38-08**)
+
+            Run ```python legged_gym/scripts/collect.py --headless --task go2_distill --log --load_run {the log directory name}``` in another terminal on another GPU. (Can run multiple collectors in parallel)
+
+
+### A1 example ###
+
 1. The specialized skill policy is trained using `a1_field_config.py` as task `a1_field`
 
     Run command with `python legged_gym/scripts/train.py --headless --task a1_field`
@@ -31,11 +78,11 @@ This is the tutorial for training the skill policy and distilling the parkour po
 
     With `python legged_gym/scripts/collect.py --headless --task a1_distill --load_run {your training run}` you lauched the collector. The process will load the training policy and start collecting the data. The collected data will be saved in the directory prompted by the trainer. Remove it after you finish distillation.
 
-### Train a walk policy ###
+#### Train a walk policy ####
 
 Launch the training by `python legged_gym/scripts/train.py --headless --task a1_field`. You will find the training log in `logs/a1_field`. The folder name is also the run name.
 
-### Train each separate skill ###
+#### Train each separate skill ####
 
 - Launch the scirpt with task `a1_climb`, `a1_leap`, `a1_crawl`, `a1_tilt`. The training log will also be saved in `logs/a1_field`.
 
@@ -47,7 +94,7 @@ Launch the training by `python legged_gym/scripts/train.py --headless --task a1_
 
 - Do remember to update the `load_run` field in the corresponding log directory to load the policy from the previous stage.
 
-### Distill the parkour policy ###
+#### Distill the parkour policy ####
 
 **You will need at least two GPUs that can render in IsaacGym and have at least 24GB of memory. (typically RTX 3090)**
 
@@ -82,3 +129,5 @@ Launch the training by `python legged_gym/scripts/train.py --headless --task a1_
 ```bash
 python legged_gym/scripts/play.py --task {task} --load_run {run_name}
 ```
+
+Where `{run_name}` can be the absolute path of your log directory (which contains the `config.json` file).
diff --git a/legged_gym/legged_gym/envs/__init__.py b/legged_gym/legged_gym/envs/__init__.py
@@ -32,7 +32,7 @@
 from legged_gym.envs.a1.a1_config import A1RoughCfg, A1RoughCfgPPO, A1PlaneCfg, A1RoughCfgTPPO
 from .base.legged_robot import LeggedRobot
 from .base.legged_robot_field import LeggedRobotField
-from .base.legged_robot_noisy import LeggedRobotNoisy
+from .base.robot_field_noisy import RobotFieldNoisy
 from .anymal_c.anymal import Anymal
 from .anymal_c.mixed_terrains.anymal_c_rough_config import AnymalCRoughCfg, AnymalCRoughCfgPPO
 from .anymal_c.flat.anymal_c_flat_config import AnymalCFlatCfg, AnymalCFlatCfgPPO
@@ -44,6 +44,9 @@
 from .a1.a1_field_distill_config import A1FieldDistillCfg, A1FieldDistillCfgPPO
 from .go1.go1_field_config import Go1FieldCfg, Go1FieldCfgPPO
 from .go1.go1_field_distill_config import Go1FieldDistillCfg, Go1FieldDistillCfgPPO
+from .go2.go2_config import Go2RoughCfg, Go2RoughCfgPPO
+from .go2.go2_field_config import Go2FieldCfg, Go2FieldCfgPPO
+from .go2.go2_distill_config import Go2DistillCfg, Go2DistillCfgPPO
 
 
 import os
@@ -54,35 +57,15 @@
 task_registry.register( "anymal_c_flat", Anymal, AnymalCFlatCfg(), AnymalCFlatCfgPPO() )
 task_registry.register( "anymal_b", Anymal, AnymalBRoughCfg(), AnymalBRoughCfgPPO() )
 task_registry.register( "a1", LeggedRobot, A1RoughCfg(), A1RoughCfgPPO() )
-task_registry.register( "a1_teacher", LeggedRobot, A1PlaneCfg(), A1RoughCfgTPPO() )
-task_registry.register( "a1_field", LeggedRobotNoisy, A1FieldCfg(), A1FieldCfgPPO() )
-task_registry.register( "a1_distill", LeggedRobotNoisy, A1FieldDistillCfg(), A1FieldDistillCfgPPO() )
 task_registry.register( "cassie", Cassie, CassieRoughCfg(), CassieRoughCfgPPO() )
-task_registry.register( "go1_field", LeggedRobotNoisy, Go1FieldCfg(), Go1FieldCfgPPO())
-task_registry.register( "go1_distill", LeggedRobotNoisy, Go1FieldDistillCfg(), Go1FieldDistillCfgPPO())
+task_registry.register( "go1_field", LeggedRobot, Go1FieldCfg(), Go1FieldCfgPPO())
+task_registry.register( "go1_distill", LeggedRobot, Go1FieldDistillCfg(), Go1FieldDistillCfgPPO())
+task_registry.register( "go2", LeggedRobot, Go2RoughCfg(), Go2RoughCfgPPO() )
+task_registry.register( "go2_field", RobotFieldNoisy, Go2FieldCfg(), Go2FieldCfgPPO() )
+task_registry.register( "go2_distill", RobotFieldNoisy, Go2DistillCfg(), Go2DistillCfgPPO() )
 
 ## The following tasks are for the convinience of opensource
 from .a1.a1_remote_config import A1RemoteCfg, A1RemoteCfgPPO
-task_registry.register( "a1_remote", LeggedRobotNoisy, A1RemoteCfg(), A1RemoteCfgPPO() )
-from .a1.a1_jump_config import A1JumpCfg, A1JumpCfgPPO
-task_registry.register( "a1_jump", LeggedRobotNoisy, A1JumpCfg(), A1JumpCfgPPO() )
-from .a1.a1_down_config import A1DownCfg, A1DownCfgPPO
-task_registry.register( "a1_down", LeggedRobotNoisy, A1DownCfg(), A1DownCfgPPO() )
-from .a1.a1_leap_config import A1LeapCfg, A1LeapCfgPPO
-task_registry.register( "a1_leap", LeggedRobotNoisy, A1LeapCfg(), A1LeapCfgPPO() )
-from .a1.a1_crawl_config import A1CrawlCfg, A1CrawlCfgPPO
-task_registry.register( "a1_crawl", LeggedRobotNoisy, A1CrawlCfg(), A1CrawlCfgPPO() )
-from .a1.a1_tilt_config import A1TiltCfg, A1TiltCfgPPO
-task_registry.register( "a1_tilt", LeggedRobotNoisy, A1TiltCfg(), A1TiltCfgPPO() )
+task_registry.register( "a1_remote", LeggedRobot, A1RemoteCfg(), A1RemoteCfgPPO() )
 from .go1.go1_remote_config import Go1RemoteCfg, Go1RemoteCfgPPO
-task_registry.register( "go1_remote", LeggedRobotNoisy, Go1RemoteCfg(), Go1RemoteCfgPPO() )
-from .go1.go1_jump_config import Go1JumpCfg, Go1JumpCfgPPO
-task_registry.register( "go1_jump", LeggedRobotNoisy, Go1JumpCfg(), Go1JumpCfgPPO() )
-from .go1.go1_down_config import Go1DownCfg, Go1DownCfgPPO
-task_registry.register( "go1_down", LeggedRobotNoisy, Go1DownCfg(), Go1DownCfgPPO() )
-from .go1.go1_leap_config import Go1LeapCfg, Go1LeapCfgPPO
-task_registry.register( "go1_leap", LeggedRobotNoisy, Go1LeapCfg(), Go1LeapCfgPPO() )
-from .go1.go1_crawl_config import Go1CrawlCfg, Go1CrawlCfgPPO
-task_registry.register( "go1_crawl", LeggedRobotNoisy, Go1CrawlCfg(), Go1CrawlCfgPPO() )
-from .go1.go1_tilt_config import Go1TiltCfg, Go1TiltCfgPPO
-task_registry.register( "go1_tilt", LeggedRobotNoisy, Go1TiltCfg(), Go1TiltCfgPPO() )
+task_registry.register( "go1_remote", LeggedRobot, Go1RemoteCfg(), Go1RemoteCfgPPO() )
diff --git a/legged_gym/legged_gym/envs/a1/a1_crawl_config.py b/legged_gym/legged_gym/envs/a1/a1_crawl_config.py
@@ -1,4 +1,5 @@
 import numpy as np
+import os.path as osp
 from legged_gym.envs.a1.a1_field_config import A1FieldCfg, A1FieldCfgPPO
 from legged_gym.utils.helpers import merge_dict
 
@@ -7,7 +8,6 @@ class A1CrawlCfg( A1FieldCfg ):
     #### uncomment this to train non-virtual terrain
     class sensor( A1FieldCfg.sensor ):
         class proprioception( A1FieldCfg.sensor.proprioception ):
-            delay_action_obs = True
             latency_range = [0.04-0.0025, 0.04+0.0075]
     #### uncomment the above to train non-virtual terrain
 
@@ -28,11 +28,11 @@ class terrain( A1FieldCfg.terrain ):
                 wall_height= 0.6,
                 no_perlin_at_obstacle= False,
             ),
-            virtual_terrain= True, # Change this to False for real terrain
+            virtual_terrain= False, # Change this to False for real terrain
         ))
 
         TerrainPerlin_kwargs = merge_dict(A1FieldCfg.terrain.TerrainPerlin_kwargs, dict(
-            zScale= 0.1,
+            zScale= 0.12,
         ))
 
     class commands( A1FieldCfg.commands ):
@@ -41,6 +41,9 @@ class ranges( A1FieldCfg.commands.ranges ):
             lin_vel_y = [0.0, 0.0]
             ang_vel_yaw = [0., 0.]
 
+    class asset( A1FieldCfg.asset ):
+        terminate_after_contacts_on = ["base"]
+
     class termination( A1FieldCfg.termination ):
         # additional factors that determines whether to terminates the episode
         termination_terms = [
@@ -51,16 +54,29 @@ class termination( A1FieldCfg.termination ):
             "out_of_track",
         ]
 
+    class domain_rand( A1FieldCfg.domain_rand ):
+        init_base_rot_range = dict(
+            roll= [-0.1, 0.1],
+            pitch= [-0.1, 0.1],
+        )
+
     class rewards( A1FieldCfg.rewards ):
         class scales:
             tracking_ang_vel = 0.05
             world_vel_l2norm = -1.
-            legs_energy_substeps = -2e-5
+            legs_energy_substeps = -1e-5
             alive = 2.
-            penetrate_depth = -6e-2 # comment this out if trianing non-virtual terrain
-            penetrate_volume = -6e-2 # comment this out if trianing non-virtual terrain
-            exceed_dof_pos_limits = -1e-1
-            exceed_torque_limits_i = -2e-1
+            # penetrate_depth = -6e-2 # comment this out if trianing non-virtual terrain
+            # penetrate_volume = -6e-2 # comment this out if trianing non-virtual terrain
+            exceed_dof_pos_limits = -8e-1
+            # exceed_torque_limits_i = -2e-1
+            exceed_torque_limits_l1norm = -4e-1
+            # collision = -0.05
+            # tilt_cond = 0.1
+            torques = -1e-5
+            yaw_abs = -0.1
+            lin_pos_y = -0.1
+        soft_dof_pos_limit = 0.9
 
     class curriculum( A1FieldCfg.curriculum ):
         penetrate_volume_threshold_harder = 1500
@@ -69,27 +85,49 @@ class curriculum( A1FieldCfg.curriculum ):
         penetrate_depth_threshold_easier = 400
 
 
+logs_root = osp.join(osp.dirname(osp.dirname(osp.dirname(osp.dirname(osp.abspath(__file__))))), "logs")
 class A1CrawlCfgPPO( A1FieldCfgPPO ):
     class algorithm( A1FieldCfgPPO.algorithm ):
         entropy_coef = 0.0
-        clip_min_std = 0.2
+        clip_min_std = 0.1
 
     class runner( A1FieldCfgPPO.runner ):
         policy_class_name = "ActorCriticRecurrent"
         experiment_name = "field_a1"
-        run_name = "".join(["Skill",
+        resume = True
+        load_run = "{Your traind walking model directory}"
+        load_run = "{Your virtually trained crawling model directory}"
+        # load_run = "Aug21_06-12-58_Skillcrawl_propDelay0.00-0.05_virtual"
+        # load_run = osp.join(logs_root, "field_a1_oracle/May21_05-25-19_Skills_crawl_pEnergy2e-5_rAlive1_pPenV6e-2_pPenD6e-2_pPosY0.2_kp50_noContactTerminate_aScale0.5")
+        # load_run = osp.join(logs_root, "field_a1_oracle/Sep26_01-38-19_Skills_crawl_propDelay0.04-0.05_pEnergy-4e-5_pTorqueL13e-01_kp40_fromMay21_05-25-19")
+        # load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Sep26_14-30-24_Skills_crawl_propDelay0.04-0.05_pEnergy-2e-5_pDof8e-01_pTorqueL14e-01_rTilt5e-01_pCollision0.2_maxPushAng0.5_kp40_fromSep26_01-38-19")
+        # load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Oct09_09-58-26_Skills_crawl_propDelay0.04-0.05_pEnergy-1e-5_pDof8e-01_pTorqueL14e-01_maxPushAng0.0_kp40_fromSep26_14-30-24")
+        load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Oct11_12-19-00_Skills_crawl_propDelay0.04-0.05_pEnergy-1e-5_pDof8e-01_pTorqueL14e-01_pPosY0.1_maxPushAng0.3_kp40_fromOct09_09-58-26")
+
+        run_name = "".join(["Skills_",
         ("Multi" if len(A1CrawlCfg.terrain.BarrierTrack_kwargs["options"]) > 1 else (A1CrawlCfg.terrain.BarrierTrack_kwargs["options"][0] if A1CrawlCfg.terrain.BarrierTrack_kwargs["options"] else "PlaneWalking")),
+        ("_comXRange{:.1f}-{:.1f}".format(A1CrawlCfg.domain_rand.com_range.x[0], A1CrawlCfg.domain_rand.com_range.x[1])),
+        ("_noLinVel" if not A1CrawlCfg.env.use_lin_vel else ""),
         ("_propDelay{:.2f}-{:.2f}".format(
                 A1CrawlCfg.sensor.proprioception.latency_range[0],
                 A1CrawlCfg.sensor.proprioception.latency_range[1],
             ) if A1CrawlCfg.sensor.proprioception.delay_action_obs else ""
         ),
-        ("_pEnergy" + np.format_float_scientific(A1CrawlCfg.rewards.scales.legs_energy_substeps, precision= 1, exp_digits= 1, trim= "-") if A1CrawlCfg.rewards.scales.legs_energy_substeps != 0. else ""),
+        # ("_pPenD{:.0e}".format(A1CrawlCfg.rewards.scales.penetrate_depth) if getattr(A1CrawlCfg.rewards.scales, "penetrate_depth", 0.) != 0. else ""),
+        ("_pEnergySubsteps" + np.format_float_scientific(A1CrawlCfg.rewards.scales.legs_energy_substeps, precision= 1, exp_digits= 1, trim= "-") if getattr(A1CrawlCfg.rewards.scales, "legs_energy_substeps", 0.) != 0. else ""),
+        ("_pDof{:.0e}".format(-A1CrawlCfg.rewards.scales.exceed_dof_pos_limits) if getattr(A1CrawlCfg.rewards.scales, "exceed_dof_pos_limits", 0.) != 0 else ""),
+        ("_pTorque" + np.format_float_scientific(-A1CrawlCfg.rewards.scales.torques, precision= 1, exp_digits= 1, trim= "-") if getattr(A1CrawlCfg.rewards.scales, "torques", 0.) != 0 else ""),
+        ("_pTorqueL1{:.0e}".format(-A1CrawlCfg.rewards.scales.exceed_torque_limits_l1norm) if getattr(A1CrawlCfg.rewards.scales, "exceed_torque_limits_l1norm", 0.) != 0 else ""),
+        # ("_rTilt{:.0e}".format(A1CrawlCfg.rewards.scales.tilt_cond) if getattr(A1CrawlCfg.rewards.scales, "tilt_cond", 0.) != 0 else ""),
+        # ("_pYaw{:.1f}".format(-A1CrawlCfg.rewards.scales.yaw_abs) if getattr(A1CrawlCfg.rewards.scales, "yaw_abs", 0.) != 0 else ""),
+        # ("_pPosY{:.1f}".format(-A1CrawlCfg.rewards.scales.lin_pos_y) if getattr(A1CrawlCfg.rewards.scales, "lin_pos_y", 0.) != 0 else ""),
+        # ("_pCollision{:.1f}".format(-A1CrawlCfg.rewards.scales.collision) if getattr(A1CrawlCfg.rewards.scales, "collision", 0.) != 0 else ""),
+        # ("_kp{:d}".format(int(A1CrawlCfg.control.stiffness["joint"])) if A1CrawlCfg.control.stiffness["joint"] != 50 else ""),
+        ("_noDelayActObs" if not A1CrawlCfg.sensor.proprioception.delay_action_obs else ""),
+        ("_noTanh"),
         ("_virtual" if A1CrawlCfg.terrain.BarrierTrack_kwargs["virtual_terrain"] else ""),
+        ("_noResume" if not resume else "_from" + "_".join(load_run.split("/")[-1].split("_")[:2])),
         ])
-        resume = True
-        load_run = "{Your traind walking model directory}"
-        load_run = "{Your virtually trained crawling model directory}"
         max_iterations = 20000
         save_interval = 500
 
diff --git a/legged_gym/legged_gym/envs/a1/a1_field_config.py b/legged_gym/legged_gym/envs/a1/a1_field_config.py
@@ -200,6 +200,7 @@ class scales:
             exceed_dof_pos_limits = -1e-1
             exceed_torque_limits_i = -2e-1
         soft_dof_pos_limit = 0.01
+        only_positive_rewards = False
 
     class normalization( A1RoughCfg.normalization ):
         class obs_scales( A1RoughCfg.normalization.obs_scales ):
@@ -286,7 +287,7 @@ class runner( A1RoughCfgPPO.runner ):
         ("_propDelay{:.2f}-{:.2f}".format(
                 A1FieldCfg.sensor.proprioception.latency_range[0],
                 A1FieldCfg.sensor.proprioception.latency_range[1],
-            ) if A1FieldCfg.sensor.proprioception.delay_action_obs else ""
+            ) if A1FieldCfg.sensor.proprioception.latency_range[1] > 0. else ""
         ),
         ("_aScale{:d}{:d}{:d}".format(
                 int(A1FieldCfg.control.action_scale[0] * 10),
@@ -297,6 +298,6 @@ class runner( A1RoughCfgPPO.runner ):
         ),
         ])
         resume = False
-        max_iterations = 10000
+        max_iterations = 5000
         save_interval = 500