Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Series.from_list/2 bugs #826

Closed
wants to merge 13 commits into from

Conversation

lkarthee
Copy link
Member

@lkarthee lkarthee commented Jan 18, 2024

Read this comment for information. #826 (comment)

Added some tests as i could not create a series with u64 value

s = Series.from_list([9_223_372_036_854_775_807, 9_223_372_036_854_775_806, 9_223_372_036_854_775_808], dtype: {:s, 32}) # error

s = Series.from_list([9_223_372_036_854_775_808]) # inferred as s64 and fails

s = Series.from_list([1, 2, 3, 9_223_372_036_854_775_808]) # mismatch s64 and u64

s = Series.from_list([9_223_372_036_854_775_808, 1, 2, 3]) # mismatch u64 and s64 

s = Series.from_list([1, 2, 3], dtype: {:u, 64}) # as u64 fails

s = Series.from_list([-1, 2, 3], dtype: {:u, 64}) 

@lkarthee
Copy link
Member Author

Finished in 6.2 seconds (6.2s async, 0.00s sync)
577 doctests, 1389 tests, 17 failures, 18 excluded, 1 skipped

https://github.com/elixir-explorer/explorer/actions/runs/7566406181/job/20603730541?pr=826

@josevalim
Copy link
Member

In my opinion, we should continue inferring those cases as s64 and continue raising if we overflow or underflow the dtype. The improvement we could do is to improve the error message.

The conversion to u64, s64, etc happens inside Rustler and I am not sure if Rustler let's us customize the error message.

@philss, do you know if we can customize the "argument error" bits here?

iex(1)> series = Explorer.Series.from_list([[1, 9_223_372_036_854_775_808]])
** (ArgumentError) argument error
    (explorer 0.8.0-dev) Explorer.PolarsBackend.Native.s_from_list_s64("", [1, 9223372036854775808])

In case we cannot, I would wrap "Explorer.PolarsBackend.Shared.from_list" into a try/rescue ArgumentError and raise a better message there.

@josevalim
Copy link
Member

Btw, s = Series.from_list([1, 2, 3], dtype: {:u, 64}) # as u64 fails works for me (and it seems it works on CI too).

@lkarthee
Copy link
Member Author

lkarthee commented Jan 18, 2024

https://hexdocs.pm/explorer/Explorer.Series.html#from_list/2

  • :dtype - Cast the series to a given :dtype. By default this is nil, which means that Explorer will infer the type from the values in the list.

I have established the base line of what is not working in https://github.com/elixir-explorer/explorer/actions/runs/7570277474/job/20615312348

from_list/2 does not handle u64 and dtypes {:s, _} and {:u, _} well.

In this PR, I have attempted to address the below:

  • dtype is inferred properly especially {:u, 64}
  • descriptive error is thrown in case it fails.

Please review - I am pretty sure there is room for improvement . Also take a stab at it improving the PR. I can learn more if you can attempt to improvise thing by seeing how others do it in a better way.

I have tested these scenarios in Polars and it handles well - either it throws descriptive error, or creates series as expected.

@lkarthee lkarthee changed the title fix from_list bugs fix Series.from_list/2 bugs Jan 18, 2024
@josevalim
Copy link
Member

As per above, I don’t think we should infer u64. Operations such as subtract can be confusing, especially coming from Elixir, and it should always be opt-in. For the cases listed here, we should improve the error message. I would not change the inference code.

@josevalim
Copy link
Member

I also think doing additional traversals on the list will be expensive. Hence my suggestion of wrapping around the calls to Rust. :)

@billylanchantin
Copy link
Member

@lkarthee I had the same temptation but I think @josevalim is right: we should keep the inference code as is.

We could follow the pattern you've established where we infer the "minimal" type that can represent the integers given to from_list. But the problem is that many operations are not closed under those types. Unsigned subtraction (as mentioned), but even addition: U8 + U8 is very likely to result in an overflow.

Also I got the following in Polars:

let s1 = Series::new("s1", [1, 2]);
// shape: (2,)
// Series: 's1' [i32]
// [
// 	1
// 	2
// ]

So it looks like they too infer signed integers by default (their i32 is our s32).

@philss
Copy link
Contributor

philss commented Jan 18, 2024

do you know if we can customize the "argument error" bits here?

@josevalim no, I don't think we can. The only way I know to personalize this message is if we implement the interactor over Term, like we do for the float version of this function:

macro_rules! from_list_float {

It's a more verbose version, but it's doable.

And just to add my 2 cents: I'm also in the team of "not try to infer". I believe that function is already too complex, and users can always pass the desired dtype. So I prefer to keep as it is today.

@lkarthee
Copy link
Member Author

lkarthee commented Jan 18, 2024

Multiple parts to the story:

  • majority of new tests which are failing should pass (even with modifications) in explorer.
  • descriptive error which does n't look like crashes.
  • from_lists/2 anything relating to {:u, 64} data type exclusive values results in error.
  • Inference of types - when dtype is nil.
  • Are we going to change existing behaviour which is documented about inference when dtype is nil, even though it might be broken . This might be a deviation from polars, we have deviated before.
  • Preferred type merging with inference type.

All code snippets are using explorer/main SHA - 62e7716

from_list/2 bug in depth

1. Loss of information

Loss of information without any warning - -1 to nil. I suspect it has something to do with rustler (might be wrong). One thing to note is the numbers are {:s , 64} not valid or exclusive {:u 64} numbers.

alias Explorer.Series
Series.from_list([-1, 9_223_372_036_854_775_801], dtype: {:u, 64}) 
#Explorer.Series<
  Polars[2]
  u64 [nil, 9223372036854775801]
>

2. Cannot create {:u, 64}, list of {:u, 64} and struct of {:u, 64}

Now let's pass atleast the values in range and attempt to create {:u, 64} series. It is invoking rustler func which accepts only s64 hence the error.

Series.from_list([1, 9_223_372_036_854_775_809], dtype: {:u, 64}).
** (ArgumentError) argument error
    (explorer 0.8.0-dev) Explorer.PolarsBackend.Native.s_from_list_s64("", [1, 9223372036854775809])
    (explorer 0.8.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    (explorer 0.8.0-dev) lib/explorer/series.ex:387: Explorer.Series.from_list/2
    #cell:phle4en4dbmtgjco:1: (file)

Let's attempt to create a {:list, {:u, 64}}.

series = Series.from_list([[1], [9_223_372_036_854_775_809]], dtype: {:list, {:u, 64}})
** (ArgumentError) argument error
    (explorer 0.8.0-dev) Explorer.PolarsBackend.Native.s_from_list_s64("", [9223372036854775809])
    (elixir 1.15.7) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (elixir 1.15.7) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (explorer 0.8.0-dev) lib/explorer/polars_backend/shared.ex:114: Explorer.PolarsBackend.Shared.from_list/3
    (explorer 0.8.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    (explorer 0.8.0-dev) lib/explorer/series.ex:387: Explorer.Series.from_list/2
    #cell:aqfn6tqdn2be45yv:1: (file)

Lets attempt to create a struct with field of u64 using dtype {:struct, %{"a" => {:list, {:u, 64}}}}.

Series.from_list([%{"a" => [1]}, %{"a" => [9_223_372_036_854_775_809]}], dtype: {:struct, %{"a" => {:list, {:u, 64}}}})
** (ArgumentError) argument error
    (explorer 0.8.0-dev) Explorer.PolarsBackend.Native.s_from_list_s64("a", [9223372036854775809])
    (elixir 1.15.7) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (elixir 1.15.7) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (explorer 0.8.0-dev) lib/explorer/polars_backend/shared.ex:114: Explorer.PolarsBackend.Shared.from_list/3
    (explorer 0.8.0-dev) lib/explorer/polars_backend/shared.ex:126: anonymous fn/3 in Explorer.PolarsBackend.Shared.from_list/3
    (stdlib 5.1.1) maps.erl:416: :maps.fold_1/4
    (explorer 0.8.0-dev) lib/explorer/polars_backend/shared.ex:123: Explorer.PolarsBackend.Shared.from_list/3
    #cell:qcjmwzxpdlnvdsd4:1: (file)

3. Descriptive Error messages

Lets try to pass a wrong data type as dtype - instead of {:list, {:s, 8}}, pass {:s , 8}. Cryptic Polars Error.

series = Series.from_list([[1], [2]], dtype: {:s, 8})
** (RuntimeError) Polars Error: cannot cast List type (inner: 'Int64', to: 'Int8')
    (explorer 0.8.0-dev) lib/explorer/polars_backend/shared.ex:26: Explorer.PolarsBackend.Shared.apply_series/3
    #cell:7kssor3tml27to2y:1: (file)

Py Polars I get below

pl.Series([[1], [2]], dtype = pl.Int8)
... long stack trace omitted  but  has descriptive error message relating to problem.
TypeError: 'list' object cannot be interpreted as an integer

4. Inference

This is a documented behaviour in docs. If we are changing it - documentation should be updated about divergence.
I feel the docs are correct and divergence is not needed. Let's look at the fix in PR w.r.to inference. We are already iterating the list and if there was an integer we assume it is {:s, 64}. I have changed it to return 3 tuples:

  • {:s, 64} for range 0 to 2**63-1
  • {:s, -64} if list contains negative integers
  • {:u, 64} if list has values greater than 2 ** 63 to 2 ** 64

Then normalized {:s, -64} to {:s, 64}. This is all I did w.r.to Inference. We can may be quantify overhead (using benchmarks) approx 10 additional comparisons. These kick in only w.r.to integers.

5. Merge Preferred type

Even with explicit dtype: {:u, 64}, from_list/2 is failing or we are losing information. We are using previous information [{:s, 64}, {:s, -64}, {:u, 64}] to check valid ranges of various integers. Only for {:struct, } we are creating additional data - I am sure you can find some solution here. Rest all cases we are iterating list again. In some cases it just raises immediately like {:u, 64} to any thing other than {:U, 64}. This overhead only for cases of integers.

6. Additional tests

Current tests are inadequate as CI is passing. I have thoroughly tested add new test cases. What ever solution/fix may be majority of the test cases are valid tests. There may be modifications to error messages, but rest should pass. We can address what needs to be merged after a resolution or fix is finalised.

TLDR:

My take on it:

  • descriptive error messages are important. It will be very frustrating for users to figure out things and wasting time. How we are going to achieve it is an open question. This PR is not the only way.
  • merge preferred dtype with inferred dtype has some overhead especially w.r.to structs.
  • we can have a flag to disable merge preferred dtype validations.
  • we can defer structs validations to polars errors
  • inference changes don't add much overhead and enable us to infer {:u, 64}
  • may be we should run benchmarks on how much overhead we are adding
  • majority test cases are very much valid
  • docs should be updated if we are diverging from documented behaviour
  • Py polars are doing it - should we look into how they are doing it ?

@billylanchantin
Copy link
Member

@lkarthee I'm totally with you when it comes to docs, error messages, and tests. The only part of this that's tripping me up is the change to the inference code. It's not about overhead, but ergonomics.

The stance we're taking is that :s64 is the most ergonomic integer dtype to infer. If you infer a list [1, 2, 3] as having that dtype, you can do most addition and subtraction operations without worrying about overflow. And because you can supply a more specific dtype if you need to, this seems like a reasonable trade-off. Without it, users would regularly experience overflow gotchas.

For the specifics:

  1. Loss of information

    [-1, 1] |> Series.from_list(dtype: :u8) |> Series.to_list() #=> [nil, 1]

    I think this is the expected behavior. Is it documented as being different elsewhere?

  2. Cannot create {:u, 64}, list of {:u, 64} and struct of {:u, 64}

    Series.from_list([2**63], dtype: {:u, 64}) #=> raises

    etc. ought to work. This does seem like a bug!

  3. Descriptive Error messages
    I agree we should have better error messages.

  4. Inference
    See above.

  5. Merge Preferred type
    No opinion yet. Will need to see the code after reverting the inference changes.

  6. Additional tests
    We should definitely add tests for the bugs. Though any which test the change to the inference code will need to be dropped.

@lkarthee
Copy link
Member Author

lkarthee commented Jan 18, 2024

@lkarthee I'm totally with you when it comes to docs, error messages, and tests.

Thank you @billylanchantin sharing your view and for reading such a long post - I made a mistake of pushing tests before writing cover. Perhaps that caused confusion. Even this PR is complex. I tend not mix multiple things, but I could not scope it any lesser.

The only part of this that's tripping me up is the change to the inference code. It's not about overhead, but ergonomics.

Just to be clear - We are still defaulting to {:s, 64} unless we have a {:u, 64} value in list. We are not going into subtypes of :s or :u. There won't be inference of {s, 8} when dtype is not passed.

  1. Loss of information
[-1, 1] |> Series.from_list(dtype: :u8) |> Series.to_list() #=> [nil, 1]

I think this is the expected behavior. Is it documented as being different elsewhere?

Docs - https://hexdocs.pm/explorer/Explorer.Series.html#module-creating-series

The list must consist of a single data type and nils.

Option :dtype documentation.

:dtype - Cast the series to a given :dtype. By default this is nil, which means that Explorer will infer the type from the values in the list.

There is no word of defaulting to s64. It says it will infer - gives example of integers, floats, etc.

At the bottom of section it says:

Mixing non-numeric data types will raise an ArgumentError:
Explorer.Series.from_list([1, "a"])
** (ArgumentError) the value "a" does not match the inferred series dtype :integer

The precedent here is error w.r.to incompatible values not silent cast to nil.

Integer split to subtypes is not yet released (if I am not wrong).

We can look at Py Polars behaviour.

pl.Series([-1,1], dtype=pl.UInt8) 
# throws incompatible types error. 

I had personally faced a scenario where dates were not parsed - #566 (comment) .

It's very frustrating when things fail silently.

If data is wrong or I am using wrong type, I want it to fail with a big error. Else what is the point having types?

@josevalim
Copy link
Member

Since there are many bugs here, we should probably try to fix them individually instead of all at once.

As a starter, I would look into:

[-1, 1] |> Series.from_list(dtype: :u8) |> Series.to_list() #=> [nil, 1]

The above should raise. It doesn't need to raise something good for now but it should raise.

Series.from_list([2**63], dtype: {:u, 64})

This should not raise.

There is no word of defaulting to s64. It says it will infer - gives example of integers, floats, etc.

Yeah, that's because s64 is new. We should say a list of integers are always inferred as s64. A list of floats (and integers) is always f64 by default.

@josevalim
Copy link
Member

Series.from_list([2**63], dtype: {:u, 64})

I have pushed a fix for this. Now it works accordingly. :)

@josevalim
Copy link
Member

Yeah, that's because s64 is new. We should say a list of integers are always inferred as s64. A list of floats (and integers) is always f64 by default.

Fixed in main too.

@lkarthee
Copy link
Member Author

lkarthee commented Jan 20, 2024

I have changed the pr to add strict flag to raise if there is an overflow or underflow. Py Polars has similar flag.

  • Default integer is still {:s, 64}.
  • If strict: false (default) - there is no performance impact.
  • clear error messages
  • overflow and underflow check

Just to be clear - earlier implementation is scrapped. There is no merge logic. Iterate list only once which was already being done in main. No copy, no multiple iterations.

Currently overflow and underflow can occur.

Series.from_list([0, 256], dtype: {:u, 8})
#Explorer.Series<
  Polars[2]
  u8 [0, 0]
>
Benchmark script & results
datetime = DateTime.utc_now()
seed = DateTime.to_unix(datetime)
key = Nx.Random.key(seed)

{randints, _new_key} = Nx.Random.randint(key, -2**63, 2**63-1, shape: {100, 1000}, type: :s64)
list_1k = Nx.to_list(randints)


Benchee.run(
  %{
    "1_b_100x1000" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x) end) end,
    "3_d_100x1000_s64" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x, dtype: {:s, 64}) end) end,

    "2_b_100x1000_main" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x, old: true) end) end,
    "4_d_100x1000_s64_main" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x, dtype: {:s, 64}, old: true) end) end,

    "5_b_100x1000_strict" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x, strict: true) end) end,
    "6_d_100x1000_s64_strict" => fn -> Enum.map(list_1k, fn x -> Explorer.Series.from_list(x, dtype: {:s, 64}, strict: true) end) end,
  }
)

Here are the results. Don't see any abnormal increase in time.

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.15.7
Erlang 26.1.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking 1_b_100x1000 ...
Benchmarking 2_b_100x1000_main ...
Benchmarking 3_d_100x1000_s64 ...
Benchmarking 4_d_100x1000_s64_main ...
Benchmarking 5_b_100x1000_strict ...
Benchmarking 6_d_100x1000_s64_strict ...
Calculating statistics...
Formatting results...

Name                              ips        average  deviation         median         99th %
4_d_100x1000_s64_main           71.18       14.05 ms     ±3.29%       13.94 ms       16.04 ms
3_d_100x1000_s64                71.02       14.08 ms     ±3.44%       13.95 ms       16.24 ms
1_b_100x1000                    70.98       14.09 ms     ±3.84%       13.95 ms       16.50 ms
2_b_100x1000_main               70.85       14.11 ms     ±4.42%       13.95 ms       16.83 ms
5_b_100x1000_strict             64.73       15.45 ms     ±2.87%       15.33 ms       17.41 ms
6_d_100x1000_s64_strict         61.77       16.19 ms     ±2.76%       16.07 ms       18.07 ms

Comparison:
4_d_100x1000_s64_main           71.18
3_d_100x1000_s64                71.02 - 1.00x slower +0.0305 ms
1_b_100x1000                    70.98 - 1.00x slower +0.0401 ms
2_b_100x1000_main               70.85 - 1.00x slower +0.0643 ms
5_b_100x1000_strict             64.73 - 1.10x slower +1.40 ms
6_d_100x1000_s64_strict         61.77 - 1.15x slower +2.14 ms

@lkarthee
Copy link
Member Author

lkarthee commented Jan 20, 2024

Feel free to close this PR, if you feel we don't need strict flag.

end

defp infer_type(item, _preferred, _strict) when is_float(item) or item in @non_finite,
do: {:f, 64}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we also have precision issues with floats, since f32 cannot represent as large numbers as f64. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that should probably be another pull request.

lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
@@ -251,6 +251,7 @@ defmodule Explorer.Series do
* `:dtype` - Cast the series to a given `:dtype`. By default this is `nil`, which means
that Explorer will infer the type from the values in the list.
See the module docs for the list of valid dtypes and aliases.
* `:strict` - when `true` raises on overflow and underflow - defaults to `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is false, what happens? The value is simply discarded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused because it seems Rustler will raise on overflow-underflow if strict: false.

Copy link
Member Author

@lkarthee lkarthee Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When false overflow/underflow can happen. Discard only happens when a series is created as {:s, 64} and cast it to {:s, 8}. And cast does not happen with existing code with and without strict.

Series.from_list([-129, 256], dtype: {:s, 16}) |> Series.cast({:s, 8})
#Explorer.Series<
  Polars[2]
  s8 [nil, nil]
>

More details about rustler behaviour - #826 (comment)

@josevalim
Copy link
Member

Hi @lkarthee, I have dropped some comments, but I am still not ahead we should go with this. My understanding is that Rustler will raise anyway, so will strict: false even work as we expect? Or Rustler will convert them to nil on underflow/overflow?

@lkarthee
Copy link
Member Author

@josevalim

My understanding is that Rustler will raise anyway, so will strict: false even work as we expect?

I am a bit confused because it seems Rustler will raise on overflow-underflow if strict: false.

I was confused about rustler's behaviour - figured out that argument error occurs only when there is mismatch between iX(i8, i16, i32, i64) and uX(u8, u16, u32, u64). My assumption is some kind of typecasting is done by Rustler.

iex(2)> Series.from_list([-129, 256], dtype: {:s, 8})
19:49:17.265 [info] dtype - {:s, 8} # <- Just to make sure we are passing before rustler is called 
type_name of val is alloc::vec::Vec<core::option::Option<i8>> # <- log from rust that it is using i8 
#Explorer.Series<
  Polars[2]
  s8 [127, 0]
>
iex(3)> Series.from_list([256], dtype: {:u, 8})
#Explorer.Series<
  Polars[1]
  u8 [0]
>
iex(4)> Series.from_list([-129, 256], dtype: {:u, 8})
** (ArgumentError) argument error
    (explorer 0.9.0-dev) Explorer.PolarsBackend.Native.s_from_list_u8("", [-129, 256])
    (explorer 0.9.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    (explorer 0.9.0-dev) lib/explorer/series.ex:417: Explorer.Series.from_list/2

Overflow of float 32 - results in crash of VM:

iex(3)> Series.from_list([3.402823E+39])
#Explorer.Series<
  Polars[1]
  f64 [3.402823e39]
>
iex(4)> Series.from_list([3.402823E+39], dtype: {:f, 32})
thread '<unnamed>' panicked at src/series.rs:86:1:
    called `Result::unwrap()` on an `Err` value: {error, badarg} 
stack backtrace:iex(4)> Series.from_list([3.402823E+39], dtype: {:f, 32})
libunwind: stepWithCompactEncoding - invalid compact unwind encoding
Abort trap: 6

@josevalim
Copy link
Member

josevalim commented Jan 21, 2024

Right, that’s why I think this pull request is a bit too early. We need to make sure that the default behavior can be not strict, nor crash the Rust side, instead we need to propagate the error correctly.

From the looks of it, strict: true may be the safest choice, specially as it happens by default. so my suggestion is to improve the messages and the crashes before we add any new code or large change. What is the minimum work we can do to provide a good user experience with the current behavior?

@lkarthee
Copy link
Member Author

lkarthee commented Jan 22, 2024

Alright lets into look into f32 crashes and overflow/underflow first before proceeding any further.

We can figure out rest as we go.

f32 crash happening due to below code ?
https://github.com/rusterlium/rustler/blob/6e803c00e67b9df9bbc73375662c4f27981dd0f2/rustler/src/types/primitive.rs#L68-L79

impl<'a> Decoder<'a> for f32 {
    fn decode(term: Term) -> NifResult<f32> {
        let res: f64 = term.decode()?;
        let res = res as f32;
        // Values bigger than f32 are coerced as infinity
        if res.is_finite() {
            Ok(res)
        } else {
            Err(Error::BadArg)
        }
    }
}

@lkarthee
Copy link
Member Author

lkarthee commented Jan 22, 2024

float 32 crash is happening due to Some(item.decode::<$type>().unwrap()), , it should be match item.decode::<$type>() {...}, ?

Also what should we do on overflow or underflow ? I just returned None on Err().

This solves the crash - but overflow I don't know how to handle it? Any suggestions @philss @josevalim @billylanchantin

macro_rules! from_list_float {
    ($name:ident, $type:ty, $module:ident) => {
        #[rustler::nif(schedule = "DirtyCpu")]
        pub fn $name(name: &str, val: Term) -> ExSeries {
            let nan = atoms::nan();
            let infinity = atoms::infinity();
            let neg_infinity = atoms::neg_infinity();
            ExSeries::new(Series::new(
                name,
                val.decode::<ListIterator>()
                    .unwrap()
                    .map(|item| { 
                        match item.get_type() {
                        //TermType::Number =>  Some(item.decode::<$type>().unwrap()), 
                        TermType::Number => match item.decode::<$type>() {Ok(f) => Some(f), Err(_) => None}, 
                        TermType::Atom => {
                            if nan.eq(&item) {
                                Some($module::NAN)
                            } else if infinity.eq(&item) {
                                Some($module::INFINITY)
                            } else if neg_infinity.eq(&item) {
                                Some($module::NEG_INFINITY)
                            } else {
                                None
                            }
                        }
                        term_type => panic!("from_list/2 not implemented for {term_type:?}"),
                    }})
                    .collect::<Vec<Option<$type>>>(),
            ))
        }
    };
}

@josevalim
Copy link
Member

I think crashing is fine, but we need to crash exactly in the same way as the other ones, so we can rescue it accordingly in the elixir layer

@lkarthee
Copy link
Member Author

#837 handles float issue.

What to do with integer overflow ? https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-overflow

When you’re compiling in release mode with the --release flag, Rust does not include checks for integer overflow that cause panics. Instead, if overflow occurs, Rust performs two’s complement wrapping. In short, values greater than the maximum value the type can hold “wrap around” to the minimum of the values the type can hold. In the case of a u8, the value 256 becomes 0, the value 257 becomes 1, and so on. The program won’t panic, but the variable will have a value that probably isn’t what you were expecting it to have. Relying on integer overflow’s wrapping behavior is considered an error.
To explicitly handle the possibility of overflow, you can use these families of methods provided by the standard library for primitive numeric types:

  • Wrap in all modes with the wrapping_* methods, such as wrapping_add.
  • Return the None value if there is overflow with the checked_* methods.
  • Return the value and a boolean indicating whether there was overflow with the overflowing_* methods.
  • Saturate at the value’s minimum or maximum values with the saturating_* methods.

@josevalim
Copy link
Member

Instead, if overflow occurs, Rust performs two’s complement wrapping.

This is exactly how Nx behaves. So I am glad to call it a day.

@josevalim
Copy link
Member

To be clear, Nx doesn't deal with overflows/underflows, they always happen across all operations, and I'd say it is fine to have the same behaviour here too. One of the main points of those libraries is to enable high performance, so the trade-off makes sense?

@lkarthee
Copy link
Member Author

lkarthee commented Jan 23, 2024

Should I

  • document overflow/underflow behaviour
  • wrap "Explorer.PolarsBackend.Shared.from_list" into a try/rescue ArgumentError and raise a better message there
  • close this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants