Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Contribute Dask-SQL codebase to Apache Arrow DataFusion Python #1082

Open
16 of 48 tasks
jdye64 opened this issue Mar 13, 2023 · 0 comments
Open
16 of 48 tasks

EPIC: Contribute Dask-SQL codebase to Apache Arrow DataFusion Python #1082

jdye64 opened this issue Mar 13, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request needs triage Awaiting triage by a dask-sql maintainer

Comments

@jdye64
Copy link
Collaborator

jdye64 commented Mar 13, 2023

Is your feature request related to a problem? Please describe.
Dask-SQL currently supports our own custom set of Rust PyO3 bindings for Apache Arrow DataFusion. Since we started that effort interest has grown in that community around offering their own set of Python bindings for Arrow DataFusion. It seems sensible to me to contribute the bindings that we have and gain the development support from that community and alleviate our developer time for features and enhancements.

This EPIC is setup to track the effort of moving code to Arrow DataFusion Python and then refactoring our codebase to subsequently use it.

While the PRs will mostly be simple in nature there is likely to be several. The choice was made to do several PRs in favor of a single large PR so reviewing would be more quick and easy and to help identify any possible regressions that might present themselves in a more cornered manner.

I will attempt to keep this list up to date with PRs relevant to this effort and their status

Arrow DataFusion Python - Worklog

Dask-SQL - Worklog

  • [ENH] Add Arrow DataFusion Python as Cargo dependency #1084
  • Get conda build working with new dependencies
  • Passing test_analyze.py
  • Passing test_cmd.py
  • Passing test_compatibility.py
  • Passing test_complex.py
  • Passing test_create.py
  • Passing test_distributeby.py
  • Passing test_explain.py
  • Passing test_filter.py
  • Passing test_fugue.py
  • Passing test_function.py
  • Passing test_groupby.py
  • Passing test_hive.py
  • Passing test_intake.py
  • Passing test_jdbc.py
  • Passing test_join.py
  • Passing test_model.py
  • Passing test_over.py
  • Passing test_postgres.py
  • Passing test_rex.py
  • Passing test_sample.py
  • Passing test_schema.py
  • Passing test_select.py
  • Passing test_server.py
  • Passing test_show.py
  • Passing test_sort.py
  • Passing test_sqlite.py
  • Passing test_union.py
@jdye64 jdye64 added enhancement New feature or request needs triage Awaiting triage by a dask-sql maintainer labels Mar 13, 2023
@jdye64 jdye64 self-assigned this Mar 13, 2023
@jdye64 jdye64 changed the title EPIC: Refactor Dask-SQL codebase to use Apache Arrow DataFusion Python EPIC: Contribute Dask-SQL codebase to Apache Arrow DataFusion Python Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Awaiting triage by a dask-sql maintainer
Projects
None yet
Development

No branches or pull requests

1 participant