Skip to content

Rename the costs subdirectories #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Rename the costs subdirectories #152

wants to merge 11 commits into from

Conversation

vmoens
Copy link
Collaborator

@vmoens vmoens commented May 20, 2022

This PR renames the subdirectories of cost in loss and values.
In #117 it was suggested to use the same naming as RLax.
I agree that values makes a lot of sense. Personally, I try not to use long directory names so I'd rather have values than value_learning, I don't think it adds much.
For loss, this is essentially a good descriptor of what the content is: a bunch of loss modules. I;d rather not use policy_optimization because, for instance, DQN does not optimize a policy but just a value function. Besides, what those classes do is implement a loss function: they take data as input and output a loss value. Per se they don't do any optimization (which is the responsibility of the trainer / training script).

Hope that makes sense.

I also cleaned up a bit the files, IMPALA is way too immature to be in the repo atm.

cc @waterhorse1

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2022
@vmoens vmoens requested a review from Benjamin-eecs May 20, 2022 08:57
@vmoens vmoens linked an issue May 20, 2022 that may be closed by this pull request
1 task
@vmoens vmoens added the naming Naming convention in the code label May 20, 2022
Copy link
Contributor

@Benjamin-eecs Benjamin-eecs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I believe it is a good refactor. I leave the following comments:
I think value subpackage actually serve as a component of loss subpackage, right? So from my point of view it is a bit weird in the same level of directory.
In loss, different loss may have inheritance relationship, (i.e. pg->a2c->ppo), so we just duplicate the corresponding part of code?

So the roadmap now change to alignment with multistep subpkg in rlax and we will have a loss subpkg on our own.

@vmoens
Copy link
Collaborator Author

vmoens commented May 22, 2022

Thanks for your review @Benjamin-eecs

Overall I believe it is a good refactor. I leave the following comments: I think value subpackage actually serve as a component of loss subpackage, right? So from my point of view it is a bit weird in the same level of directory.

What do you mean? How would you see it otherwise? Could you sketch an example? I'm not sure I understand if you want more levels in the tree or less.

In loss, different loss may have inheritance relationship, (i.e. pg->a2c->ppo), so we just duplicate the corresponding part of code?

Is PPO really a sub-class of a2c? In the original a2c paper the policy is deterministic, which is quite different from PPO. PPO is certainly a subclass of policy gradient, but as a general rule of thumb we prefer to keep inheritance at a minimum in pytorch libs. The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability (if you want to change a small thing in the sub-sub-sub class you need to understand the whole stack to know what to do). It's sometime slightly better to copy pars of code IMO. But maybe i got your comment wrong...

So the roadmap now change to alignment with multistep subpkg in rlax and we will have a loss subpkg on our own.

Not sure i'm following either: Are you suggesting we rename value in multistep and place everything in the same file like in rlax?
The way value is designed is somewhat similar to torchvision transforms + functional and torch.nn + functional: we create functionals and modules that are pre-parameterize versions of those functionals.
Besides I'm not a big fan of the multistep name as -- to me -- multistep is something a bit different: in multistep DQN, you "squash" the upcoming states and sum the decayed rewards. We have a multistep class that does that at the collection level, before the data is placed in the replay buffer.
The value module just targets value computation, even when it's not multi step (TD(0)). Hence multistep is a confusing term: say you don't want to use multistep (e.g. you're working in bandit problems), I could still totally see a value function that would have some kind of baseline computation (for pg) or something similar and that would belong to objectives/value.

I hope I got your comments right. Don't hesitate to sketch the kind of git tree you'd like to see, it'd be awesome to understand your thoughts a little bit more.

@Benjamin-eecs
Copy link
Contributor

Benjamin-eecs commented May 23, 2022

@vmoens

What do you mean? How would you see it otherwise? Could you sketch an example? I'm not sure I understand if you want more levels in the tree or less.

Not sure i'm following either: Are you suggesting we rename value in multistep and place everything in the same file like in rlax?

Yes, my original intention is that value sub-package serves as a component or utility functions for loss, maybe the functionalities in value belong to one single file, or the tree of file structure can look something like below(if no other subpkg planned in objectives):

objectives/
├── common.py
├── ddpg.py
├── dqn.py
├── functional.py
├── __init__.py
├── ppo.py
├── redq.py
├── reinforce.py
├── sac.py
├── utils.py
└── value
    ├── advantages.py
    ├── functional.py
    ├── __init__.py
    ├── returns.py
    └── vtrace.py

1 directory, 15 files

Is PPO really a sub-class of a2c? In the original a2c paper the policy is deterministic, which is quite different from PPO.

My example here means that a2c and ppo both share the same actor-critic structure. This example might be appropriate to show my concern: say the original version of SAC is tailor to continuous action space, not sure if the Discrete action space version will open up a new file to present SACDiscrete. I agree with you that 'The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability' and 'It's sometime slightly better to copy pars of code IMO.'

The value module just targets value computation, even when it's not multi step (TD(0)). Hence multistep is a confusing term

Yes, I agree with you, as long as the naming of value will not cause any confusion or in collision with terms in future development, I think the current form of value looks good to me.

P.S. I think it's better to use __all__ in the init.py to show all the functionalities of current sub-package, maybe in the stable version of torchrl I think :).

@vmoens
Copy link
Collaborator Author

vmoens commented May 23, 2022

The tree looks good to me, I will refactor the objectives accordingly.

For the continuous / discrete distinction you can have a look at PPO loss for instance: it will work in the discrete and continuous case, whether it's from pixels or states, and regardless of whether you share parameters between critic and actor or not. I think using a generic data carrier like tensordict allows us to make very general losses that just point to some basic operations that are coded as nn.Module objects wrapped in a TensorDictModule.

For __all__: do you mean that we should have a __all__ = [...] in __init__.py rather than in every single file? That's fine by me we can do such refactoring. It's a bit a matter of opinion: in vision they have one __all__ per file as here, that helps to keep private and public distinction clear within each file, but I'm ok refactoring that if you think it's clearer. It's a bit of work though so it might take me some time ;)
If you think it's worth it don't hesitate to open an issue, I will upvote it and perhaps someone from the (fastly growing!) community will be happy to take it over :D

@vmoens vmoens added the bc breaking backward compatibility breaking change label May 30, 2022
vmoens added 6 commits May 30, 2022 18:12
# Conflicts:
#	test/smoke_test.py
#	torchrl/objectives/ddpg.py
#	torchrl/objectives/ppo.py
#	torchrl/objectives/redq.py
#	torchrl/objectives/value/advantages.py
#	torchrl/trainers/helpers/trainers.py
#	torchrl/trainers/trainers.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bc breaking backward compatibility breaking change CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. naming Naming convention in the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Objectives modules refactoring
3 participants