ARROW-2689: [Python] Remove parameter timestamps_to_ms #2129

xhochy · 2018-06-10T08:41:24Z

This parameter is no longer existent. For the Parquet path it was replaced by coerce_timestamps, other cases should use Column.cast().

codecov-io · 2018-06-10T09:21:29Z

Codecov Report

Merging #2129 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2129      +/-   ##
==========================================
- Coverage   86.39%   86.37%   -0.03%     
==========================================
  Files         242      230      -12     
  Lines       41481    40589     -892     
==========================================
- Hits        35838    35059     -779     
+ Misses       5643     5530     -113

Impacted Files	Coverage Δ
rust/src/list.rs
rust/src/error.rs
rust/src/array.rs
rust/src/builder.rs
rust/src/memory.rs
rust/src/list_builder.rs
rust/src/datatypes.rs
rust/src/bitmap.rs
rust/src/record_batch.rs
rust/src/memory_pool.rs
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6df28d3...38c5c48. Read the comment docs.

domoritz · 2018-06-10T20:15:44Z

So we should run https://arrow.apache.org/docs/python/generated/pyarrow.Column.html#pyarrow.Column.cast after converting a data frame to arrow. 👍

wesm · 2018-06-11T19:03:40Z

@domoritz could you elaborate on your use case a bit more?

wesm

+1

domoritz · 2018-06-11T22:22:37Z

I'm trying to convert some data from pandas to arrow but pandas' timestamps are in ns. I want to reduce the data size and use lower precision.

My code looks roughly like this:

df = pd.read_csv('flights.csv', encoding='utf-8', dtype={'FL_DATE': 'str', 'ARR_TIME': 'str', 'DEP_TIME': 'str'})

arr_time = df.FL_DATE + df.ARR_TIME.replace('2400', '0000')
data['ARRIVAL'] = pd.to_datetime(arr_time, format='%Y%m%d%H%M')

dep_time = df.FL_DATE + df.DEP_TIME.replace('2400', '0000')
data['DEPARTURE'] = pd.to_datetime(dep_time, format='%Y%m%d%H%M')

df = df.astype({'DEP_DELAY': 'int16', 'ARR_DELAY': 'int16', 'AIR_TIME': 'int16', 'DISTANCE': 'int16'})

table = pa.Table.from_pandas(df)

table.column('ARRIVAL').cast(pa.TimestampValue, True)

writer = pa.RecordBatchFileWriter(f'{name}.arrow', table.schema)
writer.write(table)
writer.close()

wesm · 2018-06-11T23:25:41Z

Okay. In this line:

table.column('ARRIVAL').cast(pa.TimestampValue, True)

Are you trying to cast that column a different timestamp unit? This line of code leaves table unmodified (data structures from the pyarrow library are immutable). All timestamps use the same amount of space (8 bytes per value)

It would be a good idea to add a documentation section about type casting and how to change the column type of a table; I don't think we have that right now. We could also add some convenience APIs to help with common workflows (e.g. replacing a single column)

domoritz · 2018-06-11T23:59:00Z

Are you trying to cast that column a different timestamp unit?

Yes, I am trying to switch to ns to ms accuracy.

I guess I have to write something like table = table.column('ARRIVAL').cast(pa.TimestampValue, True) instead. Or does this return a column and so I need table.setColumn('ARRIVAL', table.column('ARRIVAL').cast(pa.TimestampValue, True))?

wesm · 2018-06-12T04:13:31Z

Okay, let's create a JIRA about this and discuss there.

Firstly, the statement cast(pa.TimestampValue, True) will not do what you want. You either want

column.cast(pa.timestamp('ms'))

or

column.cast(pa.timestamp('ms'), safe=False)

depending on whether you want to allow unsafe casts (see http://arrow.apache.org/docs/python/generated/pyarrow.lib.Array.html#pyarrow.lib.Array.cast). I think the docstring could be improved to make more clear that a DataType instance is expected rather than a class object.

Secondly, we don't have a convenient function for replacing a column in a table to create a new table. So I would want to write:

new_column = table.column(name).cast(pa.timestamp('ms'))
new_table = table.set_column(name, new_column)

I opened https://issues.apache.org/jira/browse/ARROW-2699

wesm · 2018-06-12T04:15:20Z

See also https://issues.apache.org/jira/browse/ARROW-2700

domoritz · 2018-06-12T04:28:53Z

Thank you @wesm! I hope my comments are helpful.

ARROW-2689: [Python] Remove parameter timestamps_to_ms

38c5c48

wesm approved these changes Jun 11, 2018

View reviewed changes

wesm closed this in 34890cc Jun 11, 2018

wesm deleted the ARROW-2689 branch June 11, 2018 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ARROW-2689: [Python] Remove parameter timestamps_to_ms #2129

ARROW-2689: [Python] Remove parameter timestamps_to_ms #2129

Uh oh!

xhochy commented Jun 10, 2018

Uh oh!

codecov-io commented Jun 10, 2018

Uh oh!

domoritz commented Jun 10, 2018

Uh oh!

wesm commented Jun 11, 2018

Uh oh!

wesm left a comment

Uh oh!

domoritz commented Jun 11, 2018

Uh oh!

wesm commented Jun 11, 2018

Uh oh!

domoritz commented Jun 11, 2018 •

edited

Loading

Uh oh!

wesm commented Jun 12, 2018

Uh oh!

wesm commented Jun 12, 2018

Uh oh!

domoritz commented Jun 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

ARROW-2689: [Python] Remove parameter timestamps_to_ms #2129

ARROW-2689: [Python] Remove parameter timestamps_to_ms #2129

Uh oh!

Conversation

xhochy commented Jun 10, 2018

Uh oh!

codecov-io commented Jun 10, 2018

Codecov Report

Uh oh!

domoritz commented Jun 10, 2018

Uh oh!

wesm commented Jun 11, 2018

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

domoritz commented Jun 11, 2018

Uh oh!

wesm commented Jun 11, 2018

Uh oh!

domoritz commented Jun 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Jun 12, 2018

Uh oh!

wesm commented Jun 12, 2018

Uh oh!

domoritz commented Jun 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

domoritz commented Jun 11, 2018 •

edited

Loading