Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Series.from_tensor/2 to use new integer types #799

Merged
merged 1 commit into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Fix Series.from_tensor/2 to use new integer types
This makes the function stop infering the target series dtype and make
it suppport all the new integer signed and unsigned dtypes.

If people want to read a given tensor in a different type, they must
pass the `:dtype` option.
  • Loading branch information
philss committed Jan 4, 2024
commit 371edf56cd0a1cde84ba963188d1cf8f93ce362d
31 changes: 24 additions & 7 deletions lib/explorer/series.ex
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,17 @@ defmodule Explorer.Series do
## Options

* `:backend` - The backend to allocate the series on.
* `:dtype` - The dtype of the series, it must match the underlying tensor type.
* `:dtype` - The dtype of the series that must match the underlying tensor type.

The series can have a different dtype if the tensor is compatible with it.
For example, a tensor of `{:u, 8}` can represent a series of `:boolean` dtype.

Here are the list of compatible tensor types and dtypes:

* `{:u, 8}` tensor as a `:boolean` series.
* `{:s, 32}` tensor as a `:date` series.
* `{:s, 64}` tensor as a `:time` series.
* `{:s, 64}` tensor as a `{:datetime, unit}` or `{:duration, unit}` series.

## Examples

Expand All @@ -516,22 +526,27 @@ defmodule Explorer.Series do
f64 [1.0, 2.0, 3.0]
>

Unsigned 8-bit tensors are assumed to be booleans:

iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
iex> Explorer.Series.from_tensor(tensor)
#Explorer.Series<
Polars[3]
boolean [true, false, true]
u8 [1, 0, 1]
>

Signed 32-bit tensors are assumed to be dates:

iex> tensor = Nx.tensor([-719162, 0, 6129], type: :s32)
iex> Explorer.Series.from_tensor(tensor)
#Explorer.Series<
Polars[3]
date [0001-01-01, 1970-01-01, 1986-10-13]
s32 [-719162, 0, 6129]
>

Booleans can be read from a tensor of `{:u, 8}` type if the dtype is explicitly given:

iex> tensor = Nx.tensor([1, 0, 1], type: :u8)
iex> Explorer.Series.from_tensor(tensor, dtype: :boolean)
#Explorer.Series<
Polars[3]
boolean [true, false, true]
>

Times are signed 64-bit representing nanoseconds from midnight and
Expand Down Expand Up @@ -560,6 +575,8 @@ defmodule Explorer.Series do
type = Nx.type(tensor)
{dtype, opts} = Keyword.pop_lazy(opts, :dtype, fn -> Shared.iotype_to_dtype!(type) end)

dtype = Shared.normalise_dtype!(dtype)

if Shared.dtype_to_iotype!(dtype) != type do
raise ArgumentError,
"dtype #{inspect(dtype)} expects a tensor of type #{inspect(Shared.dtype_to_iotype!(dtype))} " <>
Expand Down
7 changes: 3 additions & 4 deletions lib/explorer/shared.ex
Original file line number Diff line number Diff line change
Expand Up @@ -510,10 +510,9 @@ defmodule Explorer.Shared do
"""
def iotype_to_dtype!(type) do
case type do
{:f, _} -> type
{:s, 64} -> {:s, 64}
{:u, 8} -> :boolean
{:s, 32} -> :date
{:f, n} when n in [32, 64] -> type
{:s, n} when n in [8, 16, 32, 64] -> type
{:u, n} when n in [8, 16, 32, 64] -> type
_ -> raise ArgumentError, "cannot convert binary/tensor type #{inspect(type)} into dtype"
end
end
Expand Down
4 changes: 0 additions & 4 deletions test/explorer/data_frame_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -3310,10 +3310,6 @@ defmodule Explorer.DataFrameTest do
assert_raise ArgumentError,
"cannot convert dtype string into a binary/tensor type",
fn -> DF.put(df, :c, i) end

assert_raise ArgumentError,
"cannot convert binary/tensor type {:u, 32} into dtype",
fn -> DF.put(df, :e, Nx.tensor([1, 2, 3], type: {:u, 32})) end
end
end

Expand Down
Loading