-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit casts/promotions in functions within Substrait's core extensions #251
Comments
+1 |
Something I'm less sure about is how to deal with awkward promotions from Especially for The same applies to On the other hand... we can also decide to be strict and require casts up front everywhere. That would significantly reduce the complexity of the YAMLs: we could then probably get away with throwing type derivation expressions out the window entirely and limit the syntax to pattern matching and substitution only. |
- coin the term "core extensions" and describe why they (and extensions in general) exist - document guidelines from substrait-io#251 and substrait-io#307 - document common options in core extensions (substrait-io#254) - better integrate the function documentation generator output into the website
Since no one seems to have any problems with this, I've documented this in #301 (mostly by just referring to this issue, though). |
- coin the term "core extensions" and describe why they (and extensions in general) exist - document guidelines from substrait-io#251 and substrait-io#307 - document common options in core extensions (substrait-io#254) - better integrate the function documentation generator output into the website
- coin the term "core extensions" and describe why they (and extensions in general) exist - document guidelines from substrait-io#251 and substrait-io#307 - document common options in core extensions (substrait-io#254) - better integrate the function documentation generator output into the website
It appears this discussion has reached its conclusion. I am closing this. |
This PR removes some implicit casts that exist in the trig functions. More discussion here: #251 Most of the trig functions already follow this behavior. This should handle the remaining ones.
This has come up in a number of PRs now, and it's becoming a bit of a mess. I feel like we need to come to some collective decision about how to approach this, and then document clearly what Substrait's stance on this is:
Given for example something like
multiply(i32, i32)
, should it A) returni32
for consistency, or B) promote toi64
becausei32
can overflow?Note that most examples are far more subtle, such as a
sum(i8) -> i64
aggregate orpower(fp32, fp32) -> fp64
, because they can still overflow or lose precision but are just less likely to do so with the larger return type.I personally feel quite strongly that functions should never promote, i.e. option A, no exceptions. This is how virtually all programming languages behave: you define only
(i8, i8) -> i8
,(i16, i16) -> i16
, ...,(i64, i64) -> i64
, and then have a separate mechanism to promote or convert between representations. In most languages that promotion is implicit (in C in particular integers are usually i32 by default, so i8 + literal -> i32 and thus you might not notice), in Substrait all such promotions must be explicit via a cast operation (which is much easier on consumers). The net result is the same. Just as it is up to the producer to make sure that there are no implicit casts, it should also be up to the producer (or user) to promote if needed.Here's a non-exhaustive list of problems and inconsistencies I see with option B:
add(i8, i8)
? The correct result type would bei9
.An argument in favor of option B that I've seen is that first upcasting and then performing the operation is somehow less performant. I agree that might be the case if implemented naively, due to cache locality and memory-boundedness if the intermediate results are actually materialized in memory or transferred over network before being passed to the operation. However, to be blunt: that's the consumer's problem. Nothing in Substrait is mandating that everything must map to kernels in the consumer one-to-one; in fact it is highly unlikely for this to be the case in general, because that would mean that every consumer out there is already implementing things in exactly the same way. A Substrait plan intended to be compatible between consumers defines only the functionality of the plan. It's then up to each consumer to implement that functionality as efficiently as possible with their internal representation, insofar as that differs from Substrait. So, if a consumer that has a
sum_i64(i8, i8) -> i64
kernel internally encounters asum(cast<i64>(i8), cast<i64>(i8)) -> i64
pattern in an incoming plan, it should use it rather than naively doing the casts first just because Substrait writes it that way. That adds complexity to consumers, sure, but you can't have both interoperability between consumers and one-to-one mapping for each consumer.As a stopgap solution until such optimization logic exists, and in order for Substrait to also be able to represent consumer-specific optimized plans (or hand-optimized plans a la inline assembly), extensions exist: if a consumer supports that
sum_i64(i8, i8) -> i64
operation, that consumer is free to and IMO should define that function in their own extension set, and a producer that knows about the consumer it's talking to is free to use it. But such compound functions do not belong in Substrait's core function set, if only to make it very clear that a plan with such "inline assembly" operations is no longer expected to consumer-agnostic.Again, this is just my opinion. I'm opening this issue to have a centralized place to discuss this topic in.
The text was updated successfully, but these errors were encountered: