-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out the rules.jl
as a sub-package (or a separate package) ?
#108
Comments
This package still likes to think of itself as the new little thing moved outside Flux... it could be splintered further but it's a bit of a pain to manage too many small packages. Am curious though what designs for an update API you have in mind? It's possible that what's here could still be simplified, and certain that it could allow more features. |
I don't really have any concrete idea right now, but it seems worth to split out that rule part. For example, if someone prefer the older |
Ok. I guess my vote is that such a thing should just depend on this package, until it's clear that the burden of loading more code than it needs outweighs the burden of maintaining two packages here. This package isn't very long, and doesn't depend on many things. FWIW, there is an IdDict implementation here: https://github.com/FluxML/Flux.jl/pull/2029/files#diff-0ebc416cc227b7a58c66c0f81e8119576f68219437353510efc2822dcbf849dfR56-R87 . It's a lot simpler than all this tree-walking stuff, I have to say... |
I would say that's the reason why I think it worth splitting out. The current design support many features, inplace and out-of-place accumulation, higher order derivative, etc.. It's hard to design a good API to support all of them. But by restricting the number of support features, it's possible to find some better design for a specific use.
Just out of curious, what are the increased maintenance effort it would need if we just make it a sub-package here? If I understand correctly, Everything is the same except some files are put into another folder in this repo. And those rules don't seems to need lots of development/maintenance effort. |
More juggling of which packages are
Sure. Not sure how we got higher derivatives, IIRC they aren't used by any rules here & aren't even tested, no idea if the design is right. Old-style FluxML I guess... But if you want to write a package exploring something more restricted, loading Optimisers.jl just as a source of rules does not imply any promise that you support all the same things. I also think exploring better / alternative APIs isn't out-of-scope for this package. It's still quite young. |
Ok, that's fair.
I'm not sure either. I thought that's what #16 is about. |
Perhaps too snarky, sorry; the higher order case was before I paid this package any attention. If that |
A big reason why AdaHessian and other optimizers which use the HVP/Hessian diagonal or an approximation thereof haven't been ported is because they rely on nested differentiation. If we can get that working for non-trivial models, then it should be much easier to test out those methods. |
Yeah that PR really should have added the example as a test, but yes, the higher-order stuff was only put in place to potentially support them in the future prior to releasing the package (at a time when Optimisers.jl had fewer types to dispatch on making deprecations harder). We have no higher-order rules, so we don't have it as a feature (yet). |
There are many different design on how the update function/api should be implemented, but the application rules of each optimizer instances are the same. I wonder if we can separate those rules as another package (e.g.
OptimiserRules.jl
?), so that we can test on other update implementation easily or enable the possibility of having multiple optimizer packages with different update api defined.The text was updated successfully, but these errors were encountered: