-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
It looks like I'm able to cast ints/string:
> import pyarrow as pa
> n_legs = pa.array([2, 2, 4, 4, 5, 100])
> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
> names = ["n_legs", "animals"]
> batch = pa.RecordBatch.from_arrays([n_legs, animals], names=names)
> batch
pyarrow.RecordBatch
n_legs: int64
animals: string
----
n_legs: [2,2,4,4,5,100]
animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
> schema = pa.schema([
> ('n_legs', pa.int64()),
> ('animals', pa.string()),
> ])
> pa.RecordBatchReader.from_batches(
> schema,
> [batch]
> ).cast(schema).read_all()
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,2,4,4,5,100]]
animals: [["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]]But it seems to fail with a date32:
> import pyarrow as pa
> from datetime import date
> birthday = [date(1990, 3, 1)]
> names = ["Fokko"]
> batch = pa.RecordBatch.from_arrays([birthday, names], names=['birthday', 'name'])
> batch
pyarrow.RecordBatch
birthday: date32[day]
name: string
----
birthday: [1990-03-01]
name: ["Fokko"]
> schema = pa.schema([
> ('birthday', pa.date32()),
> ('name', pa.string()),
> ])
> pa.RecordBatchReader.from_batches(
> schema,
> [batch]
> ).cast(schema).read_all()
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
Cell In[6], line 9
1 schema = pa.schema([
2 ('birthday', pa.date32()),
3 ('name', pa.string()),
4 ])
6 pa.RecordBatchReader.from_batches(
7 schema,
8 [batch]
----> 9 ).cast(schema).read_all()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/ipc.pxi:800, in pyarrow.lib.RecordBatchReader.cast()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()
ArrowTypeError: Field 0 cannot be cast from date32[day] to date32[day]Same for date64:
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
Cell In[42], line 15
4 schema = pa.schema([
5 # ('date32', pa.date32()),
6 ('date64', pa.date64()),
7 ])
9 batch = pa.RecordBatch.from_arrays(data, schema=schema)
12 table = pa.RecordBatchReader.from_batches(
13 schema,
14 [batch]
---> 15 ).cast(schema).read_all()
17 assert table['date32'][0].as_py() == dt
18 assert table['date64'][0].as_py() == dt
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/ipc.pxi:800, in pyarrow.lib.RecordBatchReader.cast()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()
ArrowTypeError: Field 0 cannot be cast from date64[ms] to date64[ms]This looks like a valid cast operation to me. Please advise. Happy to create a PR, if someone can point out the place where I should add the test would be very helpful, since I'm not familiar with the codebase :)
> pa.__version__
'16.1.0'
Component(s)
C++