Skip to content

Commit

Permalink
[upgrade] unified interface and a easier example
Browse files Browse the repository at this point in the history
* embed all function and new concepts into legged_robot for base implementation
* add namedarraytuple concept borrowed from astooke/rlpyt
* use namedarraytuple in rollout storage and summarize a minibatch object
* add `rollout_file` concept to store rollout data in files and demonstrations
* add state estimator module/algo implementation
* complete rewrite `play.py` example
* rename `actions_scaled_torque_clipped` to `actions_scaled_clipped`
* add example onboard codes for deploying on Go2.
  • Loading branch information
Ziwen Zhuang committed Jul 19, 2024
1 parent 96317ac commit 1ffd6d7
Show file tree
Hide file tree
Showing 108 changed files with 47,651 additions and 3,067 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,31 @@ Conference on Robot Learning (CoRL) 2023, **Oral**, **Best Systems Paper Award F

## Repository Structure ##
* `legged_gym`: contains the isaacgym environment and config files.
- `legged_gym/legged_gym/envs/a1/`: contains all the training config files.
- `legged_gym/legged_gym/envs/{robot}/`: contains all the training config files for a specific robot
- `legged_gym/legged_gym/envs/base/`: contains all the environment implementation.
- `legged_gym/legged_gym/utils/terrain/`: contains the terrain generation code.
* `rsl_rl`: contains the network module and algorithm implementation. You can copy this folder directly to your robot.
- `rsl_rl/rsl_rl/algorithms/`: contains the algorithm implementation.
- `rsl_rl/rsl_rl/modules/`: contains the network module implementation.

## Training in Simulation ##
To install and run the code for training A1 in simulation, please clone this repository and follow the instructions in [legged_gym/README.md](legged_gym/README.md).
To install and run the code for training A1/Go2 in simulation, please clone this repository and follow the instructions in [legged_gym/README.md](legged_gym/README.md).

## Hardware Deployment ##
To deploy the trained model on your real robot, please follow the instructions in [Deploy.md](Deploy.md).
To deploy the trained model on your unitree Go1 robot, please follow the instructions in [Deploy-Go1.md](onboard_codes/Deploy-Go1.md) for deploying on the Unittree Go1 robot.

To deploy the trained model on your unitree Go2 robot, please follow the instructions in [Deploy-Go2.md](onboard_codes/Deploy-Go2.md) for deploying on the Unittree Go2 robot.


## Trouble Shooting ##
If you cannot run the distillation part or all graphics computing goes to GPU 0 dispite you have multiple GPUs and have set the CUDA_VISIBLE_DEVICES, please use docker to isolate each GPU.

## To Do (will be done before Nov 2023) ##
- [x] Go1 training configuration (not from scratch)
## To Do ##
- [x] Go1 training configuration (does not guarantee the same performance as the paper)
- [ ] A1 deployment code
- [x] Go1 deployment code
- [x] Go2 training configuration example (does not guarantee the same performance as the paper)
- [x] Go2 deployment code example

## Citation ##
If you find this project helpful to your research, please consider cite us! This is really important to us.
Expand Down
55 changes: 52 additions & 3 deletions legged_gym/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,53 @@ This is the tutorial for training the skill policy and distilling the parkour po
## Usage ##
***Always run your script in the root path of this legged_gym folder (which contains a `setup.py` file).***

### Go2 example (newer and simplier, does not guarantee performance) ###

1. Train a walking policy for planar locomotion
```python
python legged_gym/scripts/train.py --headless --task go2
```
Training logs will be saved in `logs/rough_go2`.

2. Train the parkour policy. In this example, we use **scandot** for terrain perception, so we remove the **crawl** skill.

- Update the `"{Your trained walking model directory}"` value in the config file `legged_gym/legged_gym/envs/go2/go2_field_config.py` with the trained walking policy folder name.

- Run ```python legged_gym/scripts/train.py --headless --task go2_field```

The training logs will be saved in `logs/field_go2`.

3. Distill the parkour policy

- Update the following literals in the config file `legged_gym/legged_gym/envs/go2/go2_distill_config.py`

- `"{Your trained oracle parkour model folder}"`: The oracle parkour policy folder name in the last step.

- `"{The latest model filename in the directory}"`: The model file name in the oracle parkour policy folder in the last step.

- `"{A temporary directory to store collected trajectory}"`: A temporary directory to store the collected trajectory data.

- Calibrate your depth camera extrinsic pose and update the `position` and `rotation` field in `sensor.forward_camera` class

- Run distillation process (choose one among the two options)

1. Run the distillation process in a single process if you believe you have a powerful GPU than Nvidia RTX 3090.

Set `multi_process_` to `False` in the config file.

Run ```python legged_gym/scripts/train.py --headless --task go2_distill```

2. Run the distillation process in multiple processes with multiple GPUs

Run ```python legged_gym/scripts/train.py --headless --task go2_distill```

Find the log directory generated by the training process when prompted waiting. (e.g. **Jul18_07-22-08_Go2_10skills_fromJul16_07-38-08**)

Run ```python legged_gym/scripts/collect.py --headless --task go2_distill --log --load_run {the log directory name}``` in another terminal on another GPU. (Can run multiple collectors in parallel)


### A1 example ###

1. The specialized skill policy is trained using `a1_field_config.py` as task `a1_field`

Run command with `python legged_gym/scripts/train.py --headless --task a1_field`
Expand All @@ -31,11 +78,11 @@ This is the tutorial for training the skill policy and distilling the parkour po

With `python legged_gym/scripts/collect.py --headless --task a1_distill --load_run {your training run}` you lauched the collector. The process will load the training policy and start collecting the data. The collected data will be saved in the directory prompted by the trainer. Remove it after you finish distillation.

### Train a walk policy ###
#### Train a walk policy ####

Launch the training by `python legged_gym/scripts/train.py --headless --task a1_field`. You will find the training log in `logs/a1_field`. The folder name is also the run name.

### Train each separate skill ###
#### Train each separate skill ####

- Launch the scirpt with task `a1_climb`, `a1_leap`, `a1_crawl`, `a1_tilt`. The training log will also be saved in `logs/a1_field`.

Expand All @@ -47,7 +94,7 @@ Launch the training by `python legged_gym/scripts/train.py --headless --task a1_

- Do remember to update the `load_run` field in the corresponding log directory to load the policy from the previous stage.

### Distill the parkour policy ###
#### Distill the parkour policy ####

**You will need at least two GPUs that can render in IsaacGym and have at least 24GB of memory. (typically RTX 3090)**

Expand Down Expand Up @@ -82,3 +129,5 @@ Launch the training by `python legged_gym/scripts/train.py --headless --task a1_
```bash
python legged_gym/scripts/play.py --task {task} --load_run {run_name}
```

Where `{run_name}` can be the absolute path of your log directory (which contains the `config.json` file).
39 changes: 11 additions & 28 deletions legged_gym/legged_gym/envs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from legged_gym.envs.a1.a1_config import A1RoughCfg, A1RoughCfgPPO, A1PlaneCfg, A1RoughCfgTPPO
from .base.legged_robot import LeggedRobot
from .base.legged_robot_field import LeggedRobotField
from .base.legged_robot_noisy import LeggedRobotNoisy
from .base.robot_field_noisy import RobotFieldNoisy
from .anymal_c.anymal import Anymal
from .anymal_c.mixed_terrains.anymal_c_rough_config import AnymalCRoughCfg, AnymalCRoughCfgPPO
from .anymal_c.flat.anymal_c_flat_config import AnymalCFlatCfg, AnymalCFlatCfgPPO
Expand All @@ -44,6 +44,9 @@
from .a1.a1_field_distill_config import A1FieldDistillCfg, A1FieldDistillCfgPPO
from .go1.go1_field_config import Go1FieldCfg, Go1FieldCfgPPO
from .go1.go1_field_distill_config import Go1FieldDistillCfg, Go1FieldDistillCfgPPO
from .go2.go2_config import Go2RoughCfg, Go2RoughCfgPPO
from .go2.go2_field_config import Go2FieldCfg, Go2FieldCfgPPO
from .go2.go2_distill_config import Go2DistillCfg, Go2DistillCfgPPO


import os
Expand All @@ -54,35 +57,15 @@
task_registry.register( "anymal_c_flat", Anymal, AnymalCFlatCfg(), AnymalCFlatCfgPPO() )
task_registry.register( "anymal_b", Anymal, AnymalBRoughCfg(), AnymalBRoughCfgPPO() )
task_registry.register( "a1", LeggedRobot, A1RoughCfg(), A1RoughCfgPPO() )
task_registry.register( "a1_teacher", LeggedRobot, A1PlaneCfg(), A1RoughCfgTPPO() )
task_registry.register( "a1_field", LeggedRobotNoisy, A1FieldCfg(), A1FieldCfgPPO() )
task_registry.register( "a1_distill", LeggedRobotNoisy, A1FieldDistillCfg(), A1FieldDistillCfgPPO() )
task_registry.register( "cassie", Cassie, CassieRoughCfg(), CassieRoughCfgPPO() )
task_registry.register( "go1_field", LeggedRobotNoisy, Go1FieldCfg(), Go1FieldCfgPPO())
task_registry.register( "go1_distill", LeggedRobotNoisy, Go1FieldDistillCfg(), Go1FieldDistillCfgPPO())
task_registry.register( "go1_field", LeggedRobot, Go1FieldCfg(), Go1FieldCfgPPO())
task_registry.register( "go1_distill", LeggedRobot, Go1FieldDistillCfg(), Go1FieldDistillCfgPPO())
task_registry.register( "go2", LeggedRobot, Go2RoughCfg(), Go2RoughCfgPPO() )
task_registry.register( "go2_field", RobotFieldNoisy, Go2FieldCfg(), Go2FieldCfgPPO() )
task_registry.register( "go2_distill", RobotFieldNoisy, Go2DistillCfg(), Go2DistillCfgPPO() )

## The following tasks are for the convinience of opensource
from .a1.a1_remote_config import A1RemoteCfg, A1RemoteCfgPPO
task_registry.register( "a1_remote", LeggedRobotNoisy, A1RemoteCfg(), A1RemoteCfgPPO() )
from .a1.a1_jump_config import A1JumpCfg, A1JumpCfgPPO
task_registry.register( "a1_jump", LeggedRobotNoisy, A1JumpCfg(), A1JumpCfgPPO() )
from .a1.a1_down_config import A1DownCfg, A1DownCfgPPO
task_registry.register( "a1_down", LeggedRobotNoisy, A1DownCfg(), A1DownCfgPPO() )
from .a1.a1_leap_config import A1LeapCfg, A1LeapCfgPPO
task_registry.register( "a1_leap", LeggedRobotNoisy, A1LeapCfg(), A1LeapCfgPPO() )
from .a1.a1_crawl_config import A1CrawlCfg, A1CrawlCfgPPO
task_registry.register( "a1_crawl", LeggedRobotNoisy, A1CrawlCfg(), A1CrawlCfgPPO() )
from .a1.a1_tilt_config import A1TiltCfg, A1TiltCfgPPO
task_registry.register( "a1_tilt", LeggedRobotNoisy, A1TiltCfg(), A1TiltCfgPPO() )
task_registry.register( "a1_remote", LeggedRobot, A1RemoteCfg(), A1RemoteCfgPPO() )
from .go1.go1_remote_config import Go1RemoteCfg, Go1RemoteCfgPPO
task_registry.register( "go1_remote", LeggedRobotNoisy, Go1RemoteCfg(), Go1RemoteCfgPPO() )
from .go1.go1_jump_config import Go1JumpCfg, Go1JumpCfgPPO
task_registry.register( "go1_jump", LeggedRobotNoisy, Go1JumpCfg(), Go1JumpCfgPPO() )
from .go1.go1_down_config import Go1DownCfg, Go1DownCfgPPO
task_registry.register( "go1_down", LeggedRobotNoisy, Go1DownCfg(), Go1DownCfgPPO() )
from .go1.go1_leap_config import Go1LeapCfg, Go1LeapCfgPPO
task_registry.register( "go1_leap", LeggedRobotNoisy, Go1LeapCfg(), Go1LeapCfgPPO() )
from .go1.go1_crawl_config import Go1CrawlCfg, Go1CrawlCfgPPO
task_registry.register( "go1_crawl", LeggedRobotNoisy, Go1CrawlCfg(), Go1CrawlCfgPPO() )
from .go1.go1_tilt_config import Go1TiltCfg, Go1TiltCfgPPO
task_registry.register( "go1_tilt", LeggedRobotNoisy, Go1TiltCfg(), Go1TiltCfgPPO() )
task_registry.register( "go1_remote", LeggedRobot, Go1RemoteCfg(), Go1RemoteCfgPPO() )
66 changes: 52 additions & 14 deletions legged_gym/legged_gym/envs/a1/a1_crawl_config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import numpy as np
import os.path as osp
from legged_gym.envs.a1.a1_field_config import A1FieldCfg, A1FieldCfgPPO
from legged_gym.utils.helpers import merge_dict

Expand All @@ -7,7 +8,6 @@ class A1CrawlCfg( A1FieldCfg ):
#### uncomment this to train non-virtual terrain
class sensor( A1FieldCfg.sensor ):
class proprioception( A1FieldCfg.sensor.proprioception ):
delay_action_obs = True
latency_range = [0.04-0.0025, 0.04+0.0075]
#### uncomment the above to train non-virtual terrain

Expand All @@ -28,11 +28,11 @@ class terrain( A1FieldCfg.terrain ):
wall_height= 0.6,
no_perlin_at_obstacle= False,
),
virtual_terrain= True, # Change this to False for real terrain
virtual_terrain= False, # Change this to False for real terrain
))

TerrainPerlin_kwargs = merge_dict(A1FieldCfg.terrain.TerrainPerlin_kwargs, dict(
zScale= 0.1,
zScale= 0.12,
))

class commands( A1FieldCfg.commands ):
Expand All @@ -41,6 +41,9 @@ class ranges( A1FieldCfg.commands.ranges ):
lin_vel_y = [0.0, 0.0]
ang_vel_yaw = [0., 0.]

class asset( A1FieldCfg.asset ):
terminate_after_contacts_on = ["base"]

class termination( A1FieldCfg.termination ):
# additional factors that determines whether to terminates the episode
termination_terms = [
Expand All @@ -51,16 +54,29 @@ class termination( A1FieldCfg.termination ):
"out_of_track",
]

class domain_rand( A1FieldCfg.domain_rand ):
init_base_rot_range = dict(
roll= [-0.1, 0.1],
pitch= [-0.1, 0.1],
)

class rewards( A1FieldCfg.rewards ):
class scales:
tracking_ang_vel = 0.05
world_vel_l2norm = -1.
legs_energy_substeps = -2e-5
legs_energy_substeps = -1e-5
alive = 2.
penetrate_depth = -6e-2 # comment this out if trianing non-virtual terrain
penetrate_volume = -6e-2 # comment this out if trianing non-virtual terrain
exceed_dof_pos_limits = -1e-1
exceed_torque_limits_i = -2e-1
# penetrate_depth = -6e-2 # comment this out if trianing non-virtual terrain
# penetrate_volume = -6e-2 # comment this out if trianing non-virtual terrain
exceed_dof_pos_limits = -8e-1
# exceed_torque_limits_i = -2e-1
exceed_torque_limits_l1norm = -4e-1
# collision = -0.05
# tilt_cond = 0.1
torques = -1e-5
yaw_abs = -0.1
lin_pos_y = -0.1
soft_dof_pos_limit = 0.9

class curriculum( A1FieldCfg.curriculum ):
penetrate_volume_threshold_harder = 1500
Expand All @@ -69,27 +85,49 @@ class curriculum( A1FieldCfg.curriculum ):
penetrate_depth_threshold_easier = 400


logs_root = osp.join(osp.dirname(osp.dirname(osp.dirname(osp.dirname(osp.abspath(__file__))))), "logs")
class A1CrawlCfgPPO( A1FieldCfgPPO ):
class algorithm( A1FieldCfgPPO.algorithm ):
entropy_coef = 0.0
clip_min_std = 0.2
clip_min_std = 0.1

class runner( A1FieldCfgPPO.runner ):
policy_class_name = "ActorCriticRecurrent"
experiment_name = "field_a1"
run_name = "".join(["Skill",
resume = True
load_run = "{Your traind walking model directory}"
load_run = "{Your virtually trained crawling model directory}"
# load_run = "Aug21_06-12-58_Skillcrawl_propDelay0.00-0.05_virtual"
# load_run = osp.join(logs_root, "field_a1_oracle/May21_05-25-19_Skills_crawl_pEnergy2e-5_rAlive1_pPenV6e-2_pPenD6e-2_pPosY0.2_kp50_noContactTerminate_aScale0.5")
# load_run = osp.join(logs_root, "field_a1_oracle/Sep26_01-38-19_Skills_crawl_propDelay0.04-0.05_pEnergy-4e-5_pTorqueL13e-01_kp40_fromMay21_05-25-19")
# load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Sep26_14-30-24_Skills_crawl_propDelay0.04-0.05_pEnergy-2e-5_pDof8e-01_pTorqueL14e-01_rTilt5e-01_pCollision0.2_maxPushAng0.5_kp40_fromSep26_01-38-19")
# load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Oct09_09-58-26_Skills_crawl_propDelay0.04-0.05_pEnergy-1e-5_pDof8e-01_pTorqueL14e-01_maxPushAng0.0_kp40_fromSep26_14-30-24")
load_run = osp.join(logs_root, "field_a1_noTanh_oracle", "Oct11_12-19-00_Skills_crawl_propDelay0.04-0.05_pEnergy-1e-5_pDof8e-01_pTorqueL14e-01_pPosY0.1_maxPushAng0.3_kp40_fromOct09_09-58-26")

run_name = "".join(["Skills_",
("Multi" if len(A1CrawlCfg.terrain.BarrierTrack_kwargs["options"]) > 1 else (A1CrawlCfg.terrain.BarrierTrack_kwargs["options"][0] if A1CrawlCfg.terrain.BarrierTrack_kwargs["options"] else "PlaneWalking")),
("_comXRange{:.1f}-{:.1f}".format(A1CrawlCfg.domain_rand.com_range.x[0], A1CrawlCfg.domain_rand.com_range.x[1])),
("_noLinVel" if not A1CrawlCfg.env.use_lin_vel else ""),
("_propDelay{:.2f}-{:.2f}".format(
A1CrawlCfg.sensor.proprioception.latency_range[0],
A1CrawlCfg.sensor.proprioception.latency_range[1],
) if A1CrawlCfg.sensor.proprioception.delay_action_obs else ""
),
("_pEnergy" + np.format_float_scientific(A1CrawlCfg.rewards.scales.legs_energy_substeps, precision= 1, exp_digits= 1, trim= "-") if A1CrawlCfg.rewards.scales.legs_energy_substeps != 0. else ""),
# ("_pPenD{:.0e}".format(A1CrawlCfg.rewards.scales.penetrate_depth) if getattr(A1CrawlCfg.rewards.scales, "penetrate_depth", 0.) != 0. else ""),
("_pEnergySubsteps" + np.format_float_scientific(A1CrawlCfg.rewards.scales.legs_energy_substeps, precision= 1, exp_digits= 1, trim= "-") if getattr(A1CrawlCfg.rewards.scales, "legs_energy_substeps", 0.) != 0. else ""),
("_pDof{:.0e}".format(-A1CrawlCfg.rewards.scales.exceed_dof_pos_limits) if getattr(A1CrawlCfg.rewards.scales, "exceed_dof_pos_limits", 0.) != 0 else ""),
("_pTorque" + np.format_float_scientific(-A1CrawlCfg.rewards.scales.torques, precision= 1, exp_digits= 1, trim= "-") if getattr(A1CrawlCfg.rewards.scales, "torques", 0.) != 0 else ""),
("_pTorqueL1{:.0e}".format(-A1CrawlCfg.rewards.scales.exceed_torque_limits_l1norm) if getattr(A1CrawlCfg.rewards.scales, "exceed_torque_limits_l1norm", 0.) != 0 else ""),
# ("_rTilt{:.0e}".format(A1CrawlCfg.rewards.scales.tilt_cond) if getattr(A1CrawlCfg.rewards.scales, "tilt_cond", 0.) != 0 else ""),
# ("_pYaw{:.1f}".format(-A1CrawlCfg.rewards.scales.yaw_abs) if getattr(A1CrawlCfg.rewards.scales, "yaw_abs", 0.) != 0 else ""),
# ("_pPosY{:.1f}".format(-A1CrawlCfg.rewards.scales.lin_pos_y) if getattr(A1CrawlCfg.rewards.scales, "lin_pos_y", 0.) != 0 else ""),
# ("_pCollision{:.1f}".format(-A1CrawlCfg.rewards.scales.collision) if getattr(A1CrawlCfg.rewards.scales, "collision", 0.) != 0 else ""),
# ("_kp{:d}".format(int(A1CrawlCfg.control.stiffness["joint"])) if A1CrawlCfg.control.stiffness["joint"] != 50 else ""),
("_noDelayActObs" if not A1CrawlCfg.sensor.proprioception.delay_action_obs else ""),
("_noTanh"),
("_virtual" if A1CrawlCfg.terrain.BarrierTrack_kwargs["virtual_terrain"] else ""),
("_noResume" if not resume else "_from" + "_".join(load_run.split("/")[-1].split("_")[:2])),
])
resume = True
load_run = "{Your traind walking model directory}"
load_run = "{Your virtually trained crawling model directory}"
max_iterations = 20000
save_interval = 500

5 changes: 3 additions & 2 deletions legged_gym/legged_gym/envs/a1/a1_field_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ class scales:
exceed_dof_pos_limits = -1e-1
exceed_torque_limits_i = -2e-1
soft_dof_pos_limit = 0.01
only_positive_rewards = False

class normalization( A1RoughCfg.normalization ):
class obs_scales( A1RoughCfg.normalization.obs_scales ):
Expand Down Expand Up @@ -286,7 +287,7 @@ class runner( A1RoughCfgPPO.runner ):
("_propDelay{:.2f}-{:.2f}".format(
A1FieldCfg.sensor.proprioception.latency_range[0],
A1FieldCfg.sensor.proprioception.latency_range[1],
) if A1FieldCfg.sensor.proprioception.delay_action_obs else ""
) if A1FieldCfg.sensor.proprioception.latency_range[1] > 0. else ""
),
("_aScale{:d}{:d}{:d}".format(
int(A1FieldCfg.control.action_scale[0] * 10),
Expand All @@ -297,6 +298,6 @@ class runner( A1RoughCfgPPO.runner ):
),
])
resume = False
max_iterations = 10000
max_iterations = 5000
save_interval = 500

Loading

0 comments on commit 1ffd6d7

Please sign in to comment.