Description
Describe the bug
It has been observed that the training results, such as the reward curve, are not the same even if you manually set a random seed (for instance, seed=42).
Several similar issues have been submitted:
#489
#275
Steps to reproduce
Try running some code like : IsaacLab/source/standalone/workflows/rsl_rl/train.py
Or check the above issue to reproduce the problem.
System Info
Describe the characteristic of your environment:
- Commit: [e.g. 8f3b9ca]
- Isaac Sim Version: 2024.4.1
- OS: Ubuntu 22.04
- GPU: Geforce 3090
- CUDA: 11.2
- GPU Driver: 550
Resolution
Here's my resolution, and now I don't have any non-deterministic / stochastic behavior, and the reward curve "exactly" overlap if I train the same code multiple times.
The problem is coming from "setting the seed after the environment is created"
You can see the seed is being set at line 118 which is after the env is created at line 90
But now, if you set the seed before the env is created (like line 92 - 102 in the image below), all the behavior becomes deterministic.
I don't know why "the time at which you set the seed" is important and it "could" cause non-deterministic behavior.
It would be nice if you could provide explanation and reflect this bug into your next pull request.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
- The determinacy issue is resolved (potentially with the above fix)
- There are tests that ensure the determinacy