-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for some operations with decimals #988
Conversation
2d16ef0
to
83afad1
Compare
Instead of relying on the calculated dtype from the backend, we take the maximum scale from the numbers passed in `from_list/2`, or we cast to the scale given at the creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! This looks like it was a ton of work. And I think the tradeoffs (like the not implemented errors) are very reasonable.
test/explorer/data_frame_test.exs
Outdated
# Casting works, but the df will be different if we don't pass the precision, | ||
# because it will take the default precision for decimals, which is "38". | ||
# The second problem of not passing precision is that scale will be handled differently. | ||
# This example would divide the integers by 10^2 without the precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems important. Would it be feasible to write tests to demonstrate the behavior it references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to add some test cases for that. The only problem is that we have a constraint that raises when the "out DF" is different from what we calculate. I will check if it's possible to ignore that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm covering most of what I said in the comment, specially after removing the support for nil
precision and scale. There is still an issue when we do an arithmetic operation and the backend returns a dtype without a precision, but we are casting in the end.
@@ -2577,6 +2601,7 @@ defmodule Explorer.Series do | |||
|
|||
* floats: #{Shared.inspect_dtypes(@float_dtypes, backsticks: true)} | |||
* integers: #{Shared.inspect_dtypes(@integer_types, backsticks: true)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, there is a typo here, it should be backticks
but we fix it later. :D
@@ -3395,8 +3427,18 @@ defmodule Explorer.Series do | |||
defp cast_to_add({:datetime, p, tz}, {:duration, p}), do: {:datetime, p, tz} | |||
defp cast_to_add({:duration, p}, {:datetime, p, tz}), do: {:datetime, p, tz} | |||
defp cast_to_add({:duration, p}, {:duration, p}), do: {:duration, p} | |||
|
|||
defp cast_to_add({:decimal, p1, s1}, {:decimal, p2, s2}), | |||
do: {:decimal, maybe_max(p1, p2), maybe_max(s1, s2)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some rule we need to follow here? For example, is there a maximum value for precision and scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to ChatGPT:
Yes, in Apache Arrow's decimal128 type, the precision and scale are constrained as follows:
Precision: This represents the total number of digits that can be stored, both before and after the decimal point. For decimal128, the maximum precision is 38 digits. This means that it can store up to 38 significant digits.
Scale: This defines how many of the digits are allocated to the fractional part (i.e., after the decimal point). The scale can be any value between 0 and the precision value. For example, if you have a precision of 38 and set a scale of 10, then 28 digits can be used before the decimal point and 10 digits after.
Thus, the maximum precision is 38, and the scale can be anywhere from 0 to 38, depending on the application needs.
So I think we are good, but I'd encapsulate this logic in a function. :)
Co-authored-by: José Valim <jose.valim@dashbit.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor nits and ship it!!!
Co-authored-by: José Valim <jose.valim@dashbit.co>
Adding support for decimals in some of functions that work with integers and floats. This is adding support for creating decimal series using the
Explorer.Series.from_list/2
function as well.The decimals support is still in the experimental phase in Polars, so we won't be able to support all the operations now. Some of the operations are returning
f64
series or float results, and some of the functions are raising exceptions from Polars because they are not implemented at the backend.