-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame construction from numpy arrays and polars.datatypes.Array schema #15745
Comments
The Array or (in arrow terms) FixedSizeList doesn't have all the methods and functions that List does yet (the "yet" is hopefully). It was added later so between the Array dtype not having all the methods that List does and that List was already the default (and changing the default would be breaking), that is the answer to "why" List not Array.
I, frankly, don't think this makes sense in the context of a 1d np array. A 1d np array should become a column of whatever numeric type not a nested type. For passing 2d np arrays to the DataFrame constructor, it makes more sense, to me at least, to return a df of the same shape. I think the issue is that I don't think you'd make a list of numpy arrays naturally, instead, you'd have a 2d np array. If for some reason you have a list of np arrays then wrap the list in That said, you can get (nearly) what you want by passing a 2d np array to the Series constructor. For example:
which can then be the input to a DataFrame constructor
As an aside, notice that you don't need to go through
Confirmed bug:To drill down... Since this works:
then so should
but it doesn't. I think addressing that would make your example work |
This would work if there's a single column, but would not work eg. if you have an array column and a string column. Update in polars 1.2.1Single nested columnThis now works df=pl.DataFrame({"A": [np.array([1])]}, {"A": pl.Array(pl.Int64, 1)}) But it's not possible to put again the data into a DataFrame through the exported numpy array pl.DataFrame(df.to_numpy(), {"A": pl.Array(pl.Int64, 1)}, orient="row")
ComputeError: cannot cast 'Object' type unless the input is a python list pl.DataFrame(df.to_numpy().tolist(), {"A": pl.Array(pl.Int64, 1)}, orient="row")
┌───────────────┐
│ A │
│ --- │
│ array[i64, 1] │
╞═══════════════╡
│ [1] │
└───────────────┘ 1 nested column and a scalar oneIn this case, the nested column takes null value even if pl.DataFrame([[np.array([1]), 3], [np.array([4]), 6]], schema={"A": pl.List(pl.Int64), "B":pl.Int64}, nan_to_null=True, orient="row", strict=True)
Out[33]:
shape: (2, 2)
┌───────────┬─────┐
│ A ┆ B │
│ --- ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ null ┆ 3 │
│ null ┆ 6 │
└───────────┴─────┘ Series from numpyThis now works pl.Series("A",[np.array([1], dtype=np.int64)], dtype=pl.Array(pl.Int64,1)) but this doesn't pl.Series("A",[np.array([1], dtype=np.object_)], dtype=pl.Array(pl.Int64,1)) It would be nice to have this working, since when I cut an np.object_ numpy by columns, I always get np.object dtype. |
Checks
Reproducible example
Log output
Issue description
It's not possible to create columns of type Array passing using numpy arrays.
What works
with polars List
pl.DataFrame({"A": [np.array([1], dtype=np.int64)]}, {"A": pl.List(pl.Int64)})
with python array and polars Array
pl.DataFrame([[[1]]], {"A": pl.Array(pl.Int64, 1)}, orient="row")
What doesn't work
not specifying the type
pl.DataFrame([[np.array([1], dtype=np.int64)]]
It assigns polars List instead of Array. Why?
Row orientation
pl.DataFrame([[np.array([1], dtype=np.int64)]], {"A": pl.List(pl.Int64)}, orient="row")
You get nulls (also with pl.Array)
Expected behavior
1 Without schema, imho it should assign polars Array schema
pl.DataFrame({"A": [np.array([1], dtype=np.int64)]})
2 With schema, it should create the same as passing python
pl.DataFrame({"A": [np.array([1], dtype=np.int64)]}, {"A": pl.datatypes.Array(pl.datatypes.Int64, 1)}) == pl.DataFrame({"A": [[1]]}, {"A": pl.datatypes.Array(pl.datatypes.Int64, 1)})
Installed versions
The text was updated successfully, but these errors were encountered: