-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make log
function to be in sync with PostgresSql
#5259
Comments
Other edgecases
|
In short, this behavior doesn't seem unreasonable to me. IIRC, it also agrees with IEEE 754. Still, let's dig a little deeper and think about this in a larger context. What should we do when such values occur in the middle of a large table? What does PG do in that case? What should we do in case of an unbounded table/stream? Theoretically, this is a local out-of-domain issue and the output value indicates this clearly. Therefore, erroring out does not seem reasonable to me. Having worked on numerics for a long time in a wide variety of contexts, Datafusion's current behavior seems quite reasonable 🙂 On the other hand, we want PG compatibility in general. Maybe we can add a configuration flag that controls what we do in such cases. |
@ozankabak Its probably point of long disputes what is better: failfast or return null/nan etc. From personal point of view it seems failfast is better solution anyway as it doesn't silently corrupt the data and give the user responsibility how to deal with bad data on his side. Math function has to be simple and support only supportable data for given function. Other day old spark versions turned the value into null if it cannot be casted to timestamp and this was sooooo frustrating, that spark went away and now the |
As I said, I see the value in failing fast but doing that unconditionally is simply not an option in certain applications when you are executing a long-running query on a stream. That's why I suggested to have a configuration option to control this. Having both behaviors available allows us to cater to all use cases without impacting anyone badly, and has a nice side-effect of obviating the need for long philosophical discussions about failing fast vs. special values. |
@ozankabak thanks for feedback, the discussion is already much broader than log function itself. I just noticed, since we use rust math functions, it returns
whereas PG bursts as
I'll leave it for @alamb @andygrove to decide |
Yes, Rust is following the usual numerics convention (which is unsurprising) and the behavior naturally propagates as we rely on these functions. I agree with the emphasis on consistency. If we want to support PG-like behavior through a general configuration option, we will need to write wrappers around these functions that consult the config. |
I agree the only way everyone will be satisfied is with a config that controls it. I suspect this is a non trivial amount of effort to implement this consistently - so we should be aware of that as we start down the road |
Agreed |
Thanks @alamb so we can close this ticket then and leave all math scalar functions to work in sync with rust math functions and out of sync with PG. Probably its worth to document somewhere |
Agree -- the best place would be https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/expressions.md perhaps. Do either of you have some time to write one? Otherwise I will do so |
i'll update the doc. Filed #5312 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The
log
function has some discrepancy with Postgres for edgecases e.g.The query executes ok, but in PostgresSql the query fail fast
Describe the solution you'd like
Investigate and implement log function to be the same behavior as for PostgresSql for edgecases above and perhaps for other like negative numbers, etc
Describe alternatives you've considered
Not doing this
Additional context
Found when working on #5245
The text was updated successfully, but these errors were encountered: