Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out the rules.jl as a sub-package (or a separate package) ? #108

Open
chengchingwen opened this issue Aug 29, 2022 · 9 comments
Open

Comments

@chengchingwen
Copy link
Member

There are many different design on how the update function/api should be implemented, but the application rules of each optimizer instances are the same. I wonder if we can separate those rules as another package (e.g. OptimiserRules.jl?), so that we can test on other update implementation easily or enable the possibility of having multiple optimizer packages with different update api defined.

@mcabbott
Copy link
Member

This package still likes to think of itself as the new little thing moved outside Flux... it could be splintered further but it's a bit of a pain to manage too many small packages.

Am curious though what designs for an update API you have in mind? It's possible that what's here could still be simplified, and certain that it could allow more features.

@chengchingwen
Copy link
Member Author

I don't really have any concrete idea right now, but it seems worth to split out that rule part. For example, if someone prefer the older IdDict based update implementation, that could be easily done without including the fmap based implementation.

@mcabbott
Copy link
Member

Ok. I guess my vote is that such a thing should just depend on this package, until it's clear that the burden of loading more code than it needs outweighs the burden of maintaining two packages here. This package isn't very long, and doesn't depend on many things.

FWIW, there is an IdDict implementation here: https://github.com/FluxML/Flux.jl/pull/2029/files#diff-0ebc416cc227b7a58c66c0f81e8119576f68219437353510efc2822dcbf849dfR56-R87 . It's a lot simpler than all this tree-walking stuff, I have to say...

@chengchingwen
Copy link
Member Author

FWIW, there is an IdDict implementation ... It's a lot simpler than all this tree-walking stuff, I have to say...

I would say that's the reason why I think it worth splitting out. The current design support many features, inplace and out-of-place accumulation, higher order derivative, etc.. It's hard to design a good API to support all of them. But by restricting the number of support features, it's possible to find some better design for a specific use.

... until it's clear that the burden of loading more code than it needs outweighs the burden of maintaining two packages here

Just out of curious, what are the increased maintenance effort it would need if we just make it a sub-package here? If I understand correctly, Everything is the same except some files are put into another folder in this repo. And those rules don't seems to need lots of development/maintenance effort.

@mcabbott
Copy link
Member

what are the increased maintenance effort

More juggling of which packages are dev'd locally. More different version numbers, more ways for conflicts to occur. More waiting for registration of updates, when you need one to finish before another can be tagged. More careful thought about what function is in which package, and what the version bounds have to be when you move something from one to the other.

The current design support many features, inplace and out-of-place accumulation, higher order derivative, etc

Sure. Not sure how we got higher derivatives, IIRC they aren't used by any rules here & aren't even tested, no idea if the design is right. Old-style FluxML I guess...

But if you want to write a package exploring something more restricted, loading Optimisers.jl just as a source of rules does not imply any promise that you support all the same things.

I also think exploring better / alternative APIs isn't out-of-scope for this package. It's still quite young.

@chengchingwen
Copy link
Member Author

Ok, that's fair.

Not sure how we got higher derivatives, IIRC they aren't used by any rules here & aren't even tested, no idea if the design is right. Old-style FluxML I guess...

I'm not sure either. I thought that's what #16 is about.

@mcabbott
Copy link
Member

Perhaps too snarky, sorry; the higher order case was before I paid this package any attention.

If that AdaHessian works then someone should at least add it to the tests. I guess it wants only the diagonal of the Hessian, which is good because that's all that can make sense with this package's tree-like view of the parameters.

@ToucheSir
Copy link
Member

A big reason why AdaHessian and other optimizers which use the HVP/Hessian diagonal or an approximation thereof haven't been ported is because they rely on nested differentiation. If we can get that working for non-trivial models, then it should be much easier to test out those methods.

@darsnack
Copy link
Member

darsnack commented Aug 30, 2022

Yeah that PR really should have added the example as a test, but yes, the higher-order stuff was only put in place to potentially support them in the future prior to releasing the package (at a time when Optimisers.jl had fewer types to dispatch on making deprecations harder). We have no higher-order rules, so we don't have it as a feature (yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants