v1.1.0
This version makes the update based on return rather than rewards.
Worked with CoMLRL v1.1.0
But the cross-joint mode takes such a long time to train, so this version should be deprecated by v1.1.1
This version makes the update based on return rather than rewards.
Worked with CoMLRL v1.1.0
But the cross-joint mode takes such a long time to train, so this version should be deprecated by v1.1.1