Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to support Apache arrow bindings? #507

Open
chitralverma opened this issue Dec 11, 2022 · 4 comments
Open

Any plans to support Apache arrow bindings? #507

chitralverma opened this issue Dec 11, 2022 · 4 comments

Comments

@chitralverma
Copy link

No description provided.

@csringhofer
Copy link
Collaborator

You mean extending the cursor to return fetch results in Arrow format instead of the current row oriented way?

I don't know of any plans but it sounds like a good addition to Impyla.

@chitralverma
Copy link
Author

You mean extending the cursor to return fetch results in Arrow format instead of the current row oriented way?

I don't know of any plans but it sounds like a good addition to Impyla.

Yes, it will be great if we could have as_pyarrow_table and as_pyarrow_dataset options available some where to return the results as a PyArrow Table (eagerly) or PyArrow Dataset (lazyily) which doing zero-copy.

@Khalid-Nowaf
Copy link

I would second this strongly. +1

I'm not a Python guy, but I'm using this since it is the only (client/driver) I know for Impala that is stable and feature-complete. Adding arrow data format support will allow us to wrap it in different languages/systems with minimal cost.

@csringhofer
Copy link
Collaborator

Adding a basic implementation similar to as_pandas (

def as_pandas(cursor, coerce_float=False):
) seems quite simple. I see two things that could make this more complicated:

  1. Performance - calling fetchall() converts the results to row based format from HS2's columnar format. Converting this back to a columnar format like arrow would mean two unnecessary transposition of the result set. Avoiding this overhead is possible but needs more work.
  2. Type conversions (e.g. timestamps, which are returned in HS2 as strings). This adds both complexity and potential performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants