Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NotImplemented and NotImplementedRule #28

Closed
wants to merge 1 commit into from

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented May 1, 2019

Sometimes you just don't implement something. Maybe it's hard, maybe you're lazy like me, whatever. For such cases, there is the differential NotImplemented and its associated rule NotImplementedRule.

This was born of not being able to figure out the rule for the scalar multiple parameter in BLAS.gemm (#25), but having implemented rules for the matrix parameters. As indicated by the previous paragraph, I am lazy; it is easier to add a rule that admits that to the world than it is to actually figure out how to differentiate wrt alpha.

Sometimes you just don't implement something. Maybe it's hard, maybe
you're lazy like me, whatever. For such cases, there is the differential
`NotImplemented` and its associated rule `NotImplementedRule`.

This was born of not being able to figure out the rule for the scalar
multiple parameter in `BLAS.gemm`, but having implemented rules for the
matrix parameters.
NotImplemented <: AbstractDifferential

A differential type which behaves similar to [`DNE`](@ref) but instead signifies that
the actual differential is not implemented in ChainRules, not that it does not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a "not yet" added here?
Maybe even a "PRs to remove not implemented rules are welcome"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@ararslan
Copy link
Member Author

ararslan commented May 2, 2019

The more I think about this, the less I like it.

@nickrobinson251
Copy link
Contributor

Seems fine. Seems like it might be useful. Shall we do this?

@ararslan
Copy link
Member Author

ararslan commented Aug 2, 2019

Up to you/whoever maintains this now. I won't likely have time to revisit this but feel free to do what you will with the branch.

@oxinabox
Copy link
Member

oxinabox commented Aug 2, 2019

Yes, I want to return this instead of Nothing as the default fallback

"""
NotImplementedRule <: AbstractRule

Rule indicating that a particular derivative is not implemented by ChainRules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with @oxinabox 's comment above, I would like to see this expanded slightly to say more about what a NotImplementedRule does mean. Some context would be helpful e.g.

"""
Rule indicating that a particular derivative is not yet implemented by ChainRules. Different from [`DNERule`](@ref) in that this rule does not imply non-differentiability, but rather that no one has implemented it yet.
"""

@willtebbutt
Copy link
Member

I'm very happy for this to be merged provided that someone addresses the docstrings to provide a bit more context for the confused rule-implementer.

@nickrobinson251
Copy link
Contributor

Cool, I can take a look at finishing this up next week, unless someone else is interested in doing it :)

@oxinabox
Copy link
Member

oxinabox commented Aug 3, 2019

Yes, I want to return this instead of Nothing as the default fallback

For context currently we do:
https://github.com/JuliaDiff/ChainRulesCore.jl/blob/ccead4ffa606cd9e83f5551e3767dac212ebddaa/src/rules.jl#L352
rrule(::Any, ::Vararg{Any}; kwargs...) = nothing

The AD system when it gets that back, know that it has to actually go and do AD.

I was proposing we change that to:
rrule(f::Any, args::Vararg{Any}; kwargs...) = (f(args...; kwargs...), NotImplementedRule())

Advantage of this is internal consistancy, and less magic Nothing floating around.
Disadvantage is we would be calculating the forward result (f(args...; kwargs...)),
and the AD system is going to want to calculate the forward result itself also when finding the sensitivity/gradient.
Which is pretty bad -- makes it take twice as long. So lets not do that.

This case does still show up for case where 1 of the derivatives is listed as NotImplementedRule() though.
Which would require the same thing.
However, hopefully that is rare because the NotImplementedRule would be used rarely,
and only for things where it was unlikely to matter.

@oxinabox
Copy link
Member

oxinabox commented Aug 7, 2019

I had a discussion with @willtebbutt about this.
In particular, that last point about doing the forward pass twice.

And we concluded that for now we don't need it,
and it complicates things and causes problems,
so lets not do it.

I am closing this PR, but we can reopen it later when/if we need it.

@oxinabox oxinabox closed this Aug 7, 2019
@oxinabox oxinabox deleted the aa/not-implemented branch August 17, 2020 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants