When running play.py with --real-time, the dt used to calculate this is incorrect. It is currently using env.physics_dt, which is sim_dt. However, if the decimation is >1, then the effective dt is env.step_dt, which is sim_dt * decimation. We are running 1 env.step() per loop, so this should definitely be env.step_dt.
This affects all reinforcement_learning/<rl_library>/play.py files