Should we define mixed-scale binary arithmetic functions? #261

ianmcook · 2022-07-25T18:27:44Z

ianmcook
Jul 25, 2022

I am opening this discussion to follow up on my comment here and Jacques's reply to it: #241 (comment)

In the functions YAML, binary arithmetic functions only have implementations in which their value arguments have the same type and scale (such as both args i32 or both args i64). For example see the add function in functions_arithmetic.yaml.

Should we add implementations of these binary arithmetic functions that accept pairs of arguments that have the same basic type (integer or floating point) but different fixed scales, such as (i32, i64)?

Because of the large number of permutations, this will bloat the YAML files somewhat. However I believe there are two main benefits to doing it:

It allows producers to avoid wrapping an explicit upcast around the smaller of the two arguments in each instance of a binary function in which the arguments have different scale, which is relatively common.
It could allow some consumers to more efficiently execute the plan.

I am looking for more information about whether benefit 2 actually affords a performance benefit in real-world applications.

julianhyde · 2022-07-25T20:58:07Z

julianhyde
Jul 25, 2022

In my opinion, no. It would create a proliferation of functions that is quadratic in the number of types. There are a lot of potential numeric types already, when you throw in signed vs unsigned, decimal vs binary fixed point, floating point. There is very little efficiency to be gained supporting mixed arguments.

0 replies

jvanstraten · 2022-07-25T22:06:06Z

jvanstraten
Jul 25, 2022
Collaborator

This is duplicating #251. Not to be all like "I was here first," but we should restrict this discussion to one place.

0 replies

felipeblazing · 2022-07-26T15:39:35Z

felipeblazing
Jul 26, 2022

I would love to get more understanding of why there is little efficiency to be gained from removing upcasts whose purpose is to have data conform to the methods that are available. JIT compiled systems probably wouldn't mind this and I am guessing that something that interprets a low level plan could easily do the upcast in a register before invoking add or systems that are optimizing for CPU caches might not mind having to pay the cost of the upcast.

But I imagine batch based systems will have to materialize the upcast. This is feels like it is quite performance sensitive to me. It is using up a resource which is often scarce, memory, and requires an extra operation that wasn't strictly necessary. This is particularly true of people using hardware accelerators with limited amount of HBM. Take an Nvidia GPU for example. Scaling up would take practically the same time as the add kernel doubling the execution time of the add operation.

0 replies

jvanstraten · 2022-07-26T16:07:05Z

jvanstraten
Jul 26, 2022
Collaborator

The problem here is that, like you said, it's very implementation-dependent whether this will give you any performance or not. So much so, that people with different execution engine backgrounds aren't even properly understanding each other. All the while, Substrait core is supposed to be implementation-agnostic. Maybe in another implementation the increase in code size for implementing all these functions will make the whole system less performant with this suggestion (unlikely, sure, but who knows?), but the bigger question is where you draw the line? Maybe some other system can do a particular type of projection much more efficiently than a generic projection. Do we then add a relation for that? Maybe there's some system who really wants to have a posit data type instead of floating points because those simply are better than floats if there would be hardware support for them. Does that mean we need to add posits now? And so on.

The solution to this is to limit the features of the core part of Substrait to the minimal subset that can reasonably expected to be supported by an execution engine. Not every engine needs to support add(i32, i64) -> i64 directly, because it can also be represented exactly with add(cast<i64>(i32), i64) -> i64, so, boom, you've saved everyone for whom this function does not provide any performance benefit from having to implement this function to be compatible with everyone else. And we're not talking about a single function here; we're talking about a polynomial to linear complexity reduction here. Please also keep in mind that not every implementation has the luxury to throw C++ templates at the problem. In fact Acero won't have this luxury either if ever we want to hand-optimize all these kernels at the assembly level, but that's future-Acero's problem and out of scope for this discussion. I think everyone from Acero can agree that Acero will likely be much more performant when using compound operations, so this is worth doing in that engine either way.

I emphasized "core" at the start of that paragraph because whatever we decide here has absolutely no influence on how Acero or any other engine should implement something that looks like add(cast<i64>(i32), i64) -> i64. Acero absolutely should reduce that to its internal add(i32, i64) -> i64 kernel; this being separate operations in Substrait doesn't mean they also have to be separate in every execution engine. If Acero then wants to represent the optimized plan as Substrait again, it can use its own extensions for that, at the cost of losing generality (which is a good thing, because if not for this, Acero's consumer would also have to understand all the optimizations of all the other engines out there). That's exactly why Substrait bothered with extensions at all.

It'd be great if we could eventually make a generic pattern-based optimizer for this eventually, but for now, this will just have to be something that the Acero Substrait consumer will have to do. I'm personally of the opinion that this shouldn't be too difficult, but it will require a rewrite of the consumer, and IMO that's also a good thing, because the current consumer makes far too many assumptions about Substrait's abstractions mapping one-to-one to Acero's, and we've found all over the place that this assumption just does not hold. That is, if indeed Acero intends to support all of Substrait core and (in this case) support it efficiently, rather than just intending to use Substrait to communicate Acero-specific plans between things that know about Acero and will optimize specifically for it. Note that there's a discussion related to this goal vs non-goal question at https://issues.apache.org/jira/browse/ARROW-17183.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we define mixed-scale binary arithmetic functions? #261

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Should we define mixed-scale binary arithmetic functions? #261

ianmcook Jul 25, 2022

Replies: 4 comments

julianhyde Jul 25, 2022

jvanstraten Jul 25, 2022 Collaborator

felipeblazing Jul 26, 2022

jvanstraten Jul 26, 2022 Collaborator

ianmcook
Jul 25, 2022

julianhyde
Jul 25, 2022

jvanstraten
Jul 25, 2022
Collaborator

felipeblazing
Jul 26, 2022

jvanstraten
Jul 26, 2022
Collaborator