-
Notifications
You must be signed in to change notification settings - Fork 373
Rename the costs subdirectories #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I believe it is a good refactor. I leave the following comments:
I think value subpackage actually serve as a component of loss subpackage, right? So from my point of view it is a bit weird in the same level of directory.
In loss, different loss may have inheritance relationship, (i.e. pg->a2c->ppo), so we just duplicate the corresponding part of code?
So the roadmap now change to alignment with multistep subpkg in rlax and we will have a loss subpkg on our own.
Thanks for your review @Benjamin-eecs
What do you mean? How would you see it otherwise? Could you sketch an example? I'm not sure I understand if you want more levels in the tree or less.
Is PPO really a sub-class of a2c? In the original a2c paper the policy is deterministic, which is quite different from PPO. PPO is certainly a subclass of policy gradient, but as a general rule of thumb we prefer to keep inheritance at a minimum in pytorch libs. The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability (if you want to change a small thing in the sub-sub-sub class you need to understand the whole stack to know what to do). It's sometime slightly better to copy pars of code IMO. But maybe i got your comment wrong...
Not sure i'm following either: Are you suggesting we rename I hope I got your comments right. Don't hesitate to sketch the kind of git tree you'd like to see, it'd be awesome to understand your thoughts a little bit more. |
Yes, my original intention is that
My example here means that a2c and ppo both share the same actor-critic structure. This example might be appropriate to show my concern: say the original version of SAC is tailor to continuous action space, not sure if the Discrete action space version will open up a new file to present SACDiscrete. I agree with you that 'The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability' and 'It's sometime slightly better to copy pars of code IMO.'
Yes, I agree with you, as long as the naming of P.S. I think it's better to use |
The tree looks good to me, I will refactor the objectives accordingly. For the continuous / discrete distinction you can have a look at PPO loss for instance: it will work in the discrete and continuous case, whether it's from pixels or states, and regardless of whether you share parameters between critic and actor or not. I think using a generic data carrier like tensordict allows us to make very general losses that just point to some basic operations that are coded as For |
# Conflicts: # test/smoke_test.py # torchrl/objectives/ddpg.py # torchrl/objectives/ppo.py # torchrl/objectives/redq.py # torchrl/objectives/value/advantages.py # torchrl/trainers/helpers/trainers.py # torchrl/trainers/trainers.py
66560d5
to
571d5aa
Compare
This PR renames the subdirectories of
cost
inloss
andvalues
.In #117 it was suggested to use the same naming as RLax.
I agree that values makes a lot of sense. Personally, I try not to use long directory names so I'd rather have
values
thanvalue_learning
, I don't think it adds much.For
loss
, this is essentially a good descriptor of what the content is: a bunch of loss modules. I;d rather not usepolicy_optimization
because, for instance,DQN
does not optimize a policy but just a value function. Besides, what those classes do is implement a loss function: they take data as input and output a loss value. Per se they don't do any optimization (which is the responsibility of the trainer / training script).Hope that makes sense.
I also cleaned up a bit the files, IMPALA is way too immature to be in the repo atm.
cc @waterhorse1