Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to_arrow to get a pyarrow.Table from query results. #8609

Merged
merged 9 commits into from
Jul 10, 2019

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jul 3, 2019

An Arrow Table supports a richer set of types than a pandas DataFrame,
and is the basis of many data analysis systems. It can be used in
conjunction with pandas through the Table.to_pandas() method or the
pandas extension types provided by the fletcher package.

Towards #5204.

An Arrow `Table` supports a richer set of types than a pandas `DataFrame`,
and is the basis of many data analysis systems. It can be used in
conjunction with pandas through the `Table.to_pandas()` method or the
pandas extension types provided by the `fletcher` package.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 3, 2019
@tswast tswast marked this pull request as ready for review July 8, 2019 22:13
@tswast tswast requested review from a team, shollyman and plamut July 8, 2019 22:13
Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the tests seem to deal with NULL interactions. Might be good to add something? From a brief read of pyarrow docs, it looks like you don't have to deal with the avro type unioning at schema time, but nulls in arrays vs scalar values looks like it has some differences.

pyarrow.field("field05", pyarrow.float64()),
pyarrow.field("field06", pyarrow.float64()),
pyarrow.field("field07", module_under_test.pyarrow_numeric()),
pyarrow.field("field08", pyarrow.bool_()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the underscore signify anything, or just an arrow oddity in type representation?

Copy link
Contributor Author

@tswast tswast Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing underscore is the Python convention for avoiding name collisions with Python built-in functions such as bool.

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on in person conversation. There's possible followup around null handling, but arrow exhibits more nullable-by-default behaviors rather than needing explicit care. BigQuery doesn't allow null elements in an array in results/tables, so the difference in arrow's handling isn't relevant here.

@tswast tswast merged commit d5f5d24 into googleapis:master Jul 10, 2019
@tswast tswast deleted the issue5204-bq-tabledata.list-to_arrow branch July 10, 2019 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants