A light-weight package to collect and interactively visualize trajectories while training Mujoco Playground environments. Rscope can visualize both local and remote (potentially headless) training runs.
Important
- Requires Python 3.10 or later.
pip install rscope
Important
Mac users must run mjpython
instead of python, ex. mjpython -m rscope
To visualize locally stored rollouts:
python -m rscope
Below, update user@remote_host, for example alice@168.42.4.8.
First, set up password-free key-based SSH connection with the remote device:
ssh-keygen -t ed25519 -f ~/.ssh/rsync_key -N ""
ssh-copy-id -i ~/.ssh/rsync_key.pub user@remote_host
If this worked, you should be able to ssh in without using a password:
ssh -i ~/.ssh/rsync_key user@remote_host
echo hello
exit
To visualize rollouts stored on a remote server via SSH:
python -m rscope --ssh_to user@remote_host[:port] --ssh_key ~/.ssh/rsync_key --polling_interval 5 # port defaults to 22
- Most features from Mujoco viewer
- Browse through trajectories. Use left/right arrow keys to switch through parallel environments and up/down for recent/past trajectories.
- Live Plotting. Use
SHIFT+M
to plot trajectory rewards and the contents ofstate.metrics
, up to the first 11 keys. - Pixel Observations. Use
SHIFT+O
to overlay pixel observations if available. To use this feature, the observation must be adict
and the pixel keys must be prefixed withpixels/
.
Some background on how rscope works: between policy updates, rscope
unrolls multiple trajectories in parallel then visualizes them on CPU. While this is simpler to implement and less expensive than tracing training runs like in IsaacLab, this and other implementation details lead to some unexpected gotchas:
- Typically, stochastic policies are used for evaluating training progress while determinsitic ones are deployed. While you can use rscope on stochastic policies to get a feel for the agent's training exploration, we recommend deterministic evals.
- Renders incorrectly for domain-randomized training because the loaded assets are from the nominal model definition.
- Plots only the first 14 keys in the metrics without filtering for shaping rewards.
- Visualizes only the first 14 pixel observations.
- Cannot capture curriculum progression during training, as curriculums depend on
state.info
, which is reset at the start of an evaluator run. - Currently supports only PPO-based training.
Please run the following before making a PR:
pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files