-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling broadcasts #31
Comments
This is an unusually annoying depwarn, because it seems to be triggered upon every invocation of a Flux call. :) |
Yup. I've just been using |
You should just stop overloading |
That's not a reasonable requirement in any case where we can't broadcast arbitrary Julia functions. As it turns out there are quite a lot of cases like that, including TensorFlow, MXNet, and many of the GPU libraries. This is a real use case and it's unfortunate that the discussions in Base didn't take it into account whatsoever. |
Why can't you broadcast arbitrary Julia functions? |
Many libraries that provide an array abstraction do so in a numpy-like fashion – you get a set of "vectorised" operations like |
If you only support a small set of operators on your data, there are plenty of binary operators to choose from that you can define. You don't have to use In the longer term, the whole Matlab/numpy-like style, where only certain vectorized operations are fast (at the cost of lots of temporary arrays), kind of defeats the point of Julia. |
I also don't see how that applies to Flux and DataFlow, which are pure-Julia packages as far as I can tell. |
In the long term, yes, I'd love to have this stuff all implemented in Julia and compile GPU code on the fly etc. But that isn't going to happen immediately, so interop with existing libraries is the only reasonable option right now. Can you elaborate on how, say, broadcasting I expect it would be possible to implement broadcasting syntax in a trait-like way in which the container can choose whether to fuse, which would solve the problem for us. |
Flux has a lazy dependency on TensorFlow, and/or MXNet |
TensorFlow allows you to define efficient custom operations in C++, and it's also possible in MXNet; why couldn't you do that from Julia? Anyway, basically |
It's technically possible, it's just a big project, given that we need robust GPU compilation among other things. The right solution to this is not "wait until 2025". The thing is, I do want broadcast. The semantics are all the same, and changing the user API (especially for something so common) for an implementation detail is not reasonable. Not being able to write generic code that works over a range of implementation strategies kind of defeats the point of Julia. |
Not having fusion for user-defined container types and operations in Julia would be a much bigger sacrifice than saying that you need to rename if you explicitly want non-fusing operations. |
I'm not arguing we should trade one for the other, but I'm repeating myself now.
|
Nope, because fusion happens at a syntactic level (at lowering time), before types are known. Changing fusion to a compile-time optimization that depends on inference is a complete redesign (and would also result in semantics that depend on inference). It's something that's been tried many times in many languages and has always failed to achieve genericity for user-defined types and functions. That is a "wait until 2050" solution. |
if should_fuse(x, y)
broadcast((x, y) -> x + y, x, y)
else
broadcast(+, x, y)
end This is still a syntactical transformation that doesn't depend on inference. |
I see, yes, that would be possible. |
Using broadcasting operators in 0.6 gives deprecation warnings, and soon won't work at all as the
.+
etc function objects are removed. We also need a more generic way to handle genericf.(xs)
applications.I suggest that DataFlow lowers any broadcastf.(xs...)
toBroadcast(f)(xs...)
, where broadcast is simply a wrapper aroundf
. Calls to broadcast can be appropriately overloaded, both in Julia code and in conversions to backends, as well as made to generate.
calls again when lowered back to syntax.DataFlow now just creates explicit
broadcast
calls as part of desugaring.The text was updated successfully, but these errors were encountered: