Rename the costs subdirectories #152

vmoens · 2022-05-20T08:57:15Z

This PR renames the subdirectories of cost in loss and values.
In #117 it was suggested to use the same naming as RLax.
I agree that values makes a lot of sense. Personally, I try not to use long directory names so I'd rather have values than value_learning, I don't think it adds much.
For loss, this is essentially a good descriptor of what the content is: a bunch of loss modules. I;d rather not use policy_optimization because, for instance, DQN does not optimize a policy but just a value function. Besides, what those classes do is implement a loss function: they take data as input and output a loss value. Per se they don't do any optimization (which is the responsibility of the trainer / training script).

Hope that makes sense.

I also cleaned up a bit the files, IMPALA is way too immature to be in the repo atm.

cc @waterhorse1

torchrl/objectives/value/advantages.py

Benjamin-eecs

Overall I believe it is a good refactor. I leave the following comments:
I think value subpackage actually serve as a component of loss subpackage, right? So from my point of view it is a bit weird in the same level of directory.
In loss, different loss may have inheritance relationship, (i.e. pg->a2c->ppo), so we just duplicate the corresponding part of code?

So the roadmap now change to alignment with multistep subpkg in rlax and we will have a loss subpkg on our own.

vmoens · 2022-05-22T17:30:24Z

Thanks for your review @Benjamin-eecs

Overall I believe it is a good refactor. I leave the following comments: I think value subpackage actually serve as a component of loss subpackage, right? So from my point of view it is a bit weird in the same level of directory.

What do you mean? How would you see it otherwise? Could you sketch an example? I'm not sure I understand if you want more levels in the tree or less.

In loss, different loss may have inheritance relationship, (i.e. pg->a2c->ppo), so we just duplicate the corresponding part of code?

Is PPO really a sub-class of a2c? In the original a2c paper the policy is deterministic, which is quite different from PPO. PPO is certainly a subclass of policy gradient, but as a general rule of thumb we prefer to keep inheritance at a minimum in pytorch libs. The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability (if you want to change a small thing in the sub-sub-sub class you need to understand the whole stack to know what to do). It's sometime slightly better to copy pars of code IMO. But maybe i got your comment wrong...

So the roadmap now change to alignment with multistep subpkg in rlax and we will have a loss subpkg on our own.

Not sure i'm following either: Are you suggesting we rename value in multistep and place everything in the same file like in rlax?
The way value is designed is somewhat similar to torchvision transforms + functional and torch.nn + functional: we create functionals and modules that are pre-parameterize versions of those functionals.
Besides I'm not a big fan of the multistep name as -- to me -- multistep is something a bit different: in multistep DQN, you "squash" the upcoming states and sum the decayed rewards. We have a multistep class that does that at the collection level, before the data is placed in the replay buffer.
The value module just targets value computation, even when it's not multi step (TD(0)). Hence multistep is a confusing term: say you don't want to use multistep (e.g. you're working in bandit problems), I could still totally see a value function that would have some kind of baseline computation (for pg) or something similar and that would belong to objectives/value.

I hope I got your comments right. Don't hesitate to sketch the kind of git tree you'd like to see, it'd be awesome to understand your thoughts a little bit more.

Benjamin-eecs · 2022-05-23T03:26:11Z

@vmoens

What do you mean? How would you see it otherwise? Could you sketch an example? I'm not sure I understand if you want more levels in the tree or less.

Not sure i'm following either: Are you suggesting we rename value in multistep and place everything in the same file like in rlax?

Yes, my original intention is that value sub-package serves as a component or utility functions for loss, maybe the functionalities in value belong to one single file, or the tree of file structure can look something like below(if no other subpkg planned in objectives):

objectives/
├── common.py
├── ddpg.py
├── dqn.py
├── functional.py
├── __init__.py
├── ppo.py
├── redq.py
├── reinforce.py
├── sac.py
├── utils.py
└── value
    ├── advantages.py
    ├── functional.py
    ├── __init__.py
    ├── returns.py
    └── vtrace.py

1 directory, 15 files

Is PPO really a sub-class of a2c? In the original a2c paper the policy is deterministic, which is quite different from PPO.

My example here means that a2c and ppo both share the same actor-critic structure. This example might be appropriate to show my concern: say the original version of SAC is tailor to continuous action space, not sure if the Discrete action space version will open up a new file to present SACDiscrete. I agree with you that 'The reason is that complicated inheritance scheme can be good from an engineering perspective but bad for hackability' and 'It's sometime slightly better to copy pars of code IMO.'

The value module just targets value computation, even when it's not multi step (TD(0)). Hence multistep is a confusing term

Yes, I agree with you, as long as the naming of value will not cause any confusion or in collision with terms in future development, I think the current form of value looks good to me.

P.S. I think it's better to use __all__ in the init.py to show all the functionalities of current sub-package, maybe in the stable version of torchrl I think :).

vmoens · 2022-05-23T10:02:38Z

The tree looks good to me, I will refactor the objectives accordingly.

For the continuous / discrete distinction you can have a look at PPO loss for instance: it will work in the discrete and continuous case, whether it's from pixels or states, and regardless of whether you share parameters between critic and actor or not. I think using a generic data carrier like tensordict allows us to make very general losses that just point to some basic operations that are coded as nn.Module objects wrapped in a TensorDictModule.

For __all__: do you mean that we should have a __all__ = [...] in __init__.py rather than in every single file? That's fine by me we can do such refactoring. It's a bit a matter of opinion: in vision they have one __all__ per file as here, that helps to keep private and public distinction clear within each file, but I'm ok refactoring that if you think it's clearer. It's a bit of work though so it might take me some time ;)
If you think it's worth it don't hesitate to open an issue, I will upvote it and perhaps someone from the (fastly growing!) community will be happy to take it over :D

# Conflicts: # test/smoke_test.py # torchrl/objectives/ddpg.py # torchrl/objectives/ppo.py # torchrl/objectives/redq.py # torchrl/objectives/value/advantages.py # torchrl/trainers/helpers/trainers.py # torchrl/trainers/trainers.py

init

2074ef5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2022

vmoens requested a review from Benjamin-eecs May 20, 2022 08:57

vmoens linked an issue May 20, 2022 that may be closed by this pull request

[Feature Request] Objectives modules refactoring #117

Closed

1 task

vmoens added the naming Naming convention in the code label May 20, 2022

bf

59a707d

Benjamin-eecs reviewed May 22, 2022

View reviewed changes

torchrl/objectives/value/advantages.py Show resolved Hide resolved

Benjamin-eecs reviewed May 22, 2022

View reviewed changes

vmoens added 3 commits May 30, 2022 08:36

Merge branch 'main' into refactor_loss_names

d293319

amend

93e46ef

Merge branch 'main' into refactor_loss_names

6f757e2

vmoens added the bc breaking backward compatibility breaking change label May 30, 2022

vmoens added 6 commits May 30, 2022 18:12

Merge branch 'main' into refactor_loss_names

8248c08

# Conflicts: # test/smoke_test.py # torchrl/objectives/ddpg.py # torchrl/objectives/ppo.py # torchrl/objectives/redq.py # torchrl/objectives/value/advantages.py # torchrl/trainers/helpers/trainers.py # torchrl/trainers/trainers.py

merge adjust

53ca5ef

lint

9025e56

init

697a2d8

bf

fd660b6

bf

79b4bc2

vmoens force-pushed the main branch 2 times, most recently from 66560d5 to 571d5aa Compare June 26, 2022 17:04

vmoens force-pushed the main branch from 75b8d5f to 009cea6 Compare July 5, 2022 10:48

vmoens mentioned this pull request Aug 29, 2022

[Feature Request] TensorDict gradient support #51

Closed

sgrigory mentioned this pull request Oct 17, 2022

[Feature] Restructure torchrl/objectives #580

Merged

7 tasks

vmoens force-pushed the main branch from 00c3963 to ada0fcd Compare November 4, 2022 13:31

vmoens closed this Jun 10, 2024

vmoens deleted the refactor_loss_names branch August 7, 2024 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rename the costs subdirectories #152

Rename the costs subdirectories #152

Uh oh!

vmoens commented May 20, 2022 •

edited by Benjamin-eecs

Loading

Uh oh!

Uh oh!

Benjamin-eecs left a comment

Uh oh!

vmoens commented May 22, 2022

Uh oh!

Benjamin-eecs commented May 23, 2022 •

edited

Loading

Uh oh!

vmoens commented May 23, 2022

Uh oh!

Uh oh!

Rename the costs subdirectories #152

Rename the costs subdirectories #152

Uh oh!

Conversation

vmoens commented May 20, 2022 • edited by Benjamin-eecs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Benjamin-eecs left a comment

Choose a reason for hiding this comment

Uh oh!

vmoens commented May 22, 2022

Uh oh!

Benjamin-eecs commented May 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented May 23, 2022

Uh oh!

Uh oh!

vmoens commented May 20, 2022 •

edited by Benjamin-eecs

Loading

Benjamin-eecs commented May 23, 2022 •

edited

Loading