Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aliased differentials and Inplace Accumulation #350

Open
oxinabox opened this issue May 11, 2021 · 2 comments
Open

Aliased differentials and Inplace Accumulation #350

oxinabox opened this issue May 11, 2021 · 2 comments
Labels
bug Something isn't working design Requires some desgin before changes are made inplace accumulation for things relating to inplace accumulation of gradients

Comments

@oxinabox
Copy link
Member

If a pullback is something like dx->(dx, dx) for dx being a reference type, e.g. like the pullback for +(::Array, ::Array), then the two outputs are the same object.
They are aliased.
Even though they are actually differentials for (usually) distinct (nonaliased) primal values.
This causes inplace accumulation to act wrong.

See discussion here FluxML/Zygote.jl#962 (comment)
where @mcabbott was just adding inplace accumulation for getindex.

I am wondering if we need to require that if the primals are not aliased the differentials also need to not be aliased.
This extra copies is slow if not inplace accumulation is not actually done though.
we might want to make it configurable depending on the AD system.
(this related to the config needs for #68)

I think the ideal solution would be copy-on-write objects.
But doing that without language support is suffering.

@oxinabox oxinabox added bug Something isn't working design Requires some desgin before changes are made inplace accumulation for things relating to inplace accumulation of gradients labels May 11, 2021
@oxinabox
Copy link
Member Author

oxinabox commented May 11, 2021

@willtebbutt and @mcabbott have both proposed doing something that looks somewhat like a thunk.
Which delays the copy for the aliased differential until it is used.
(e.g. in a call to add!!)
Potentially this could even be @thunk(copy(dx)) I guess?
Though maybe we only wanty that if inplacing.
In which case it might be able to do it with an InplaceableThunk? Not sure. More thinking is needed.

The other option is something that gathers up a long chain of all attempts to accumulate against it, and then at the last minute when it has no choice calls copy and then does them all at once.
Which is still thunk-like, and might be implemented as a wrapper around nested of InplaceableThunks?

@oxinabox oxinabox added this to the v1 milestone May 11, 2021
@mcabbott
Copy link
Member

mcabbott commented May 11, 2021

Yes, I think it's not quite @thunk(copy(dy)) since you want to be able to read the array freely, only when writing must you be careful.

But it's similar in that I think it could be handled by the code interfacing to rrules. If the rule for + is something like dy -> (Xerox(dy),Xerox(dy)), most rules could just receive the un-wrapped dy.original. Only those which return the same array dy -> (dy,) would need to leave it wrapped -- or perhaps the interface function tests what they return, and re-wraps if necessary.

The simplest version would just copy on the first attempt to write. My guess is that would get you most of the benefit. The repeated scalar indexing cases of FluxML/Zygote.jl#962 still wouldn't involve any extra copies, as they didn't call the adjoint of +, they just need to know it's safe.

Delaying this as you suggest might mean that all other views of the same data have been discarded by the time you want to write. If the un-wrapping (and re-wrapping if ===) is all handled by one rrule interface function, then perhaps it would not be too hard to track the number of views in existence. But still some complexity. And holding onto the stack of things you intend to write there may also have a cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working design Requires some desgin before changes are made inplace accumulation for things relating to inplace accumulation of gradients
Projects
None yet
Development

No branches or pull requests

2 participants