Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Target #552

Closed
wrobstory opened this issue Aug 18, 2015 · 10 comments
Closed

Redshift Target #552

wrobstory opened this issue Aug 18, 2015 · 10 comments
Labels
feature Features or general enhancements

Comments

@wrobstory
Copy link

We have a handful of analysts that would be interested in using Ibis against Redshift, and I can carve out some time to work on it. A few questions:

  1. Is the codebase ready for external contributors to start targeting another db?
  2. Do you have any thoughts about how you'd like the contributor process to go? A Redshift target could either be a very long running branch, or we could get the simplest start working "I can create a table in Redshift with Ibis", merge that, and incrementally merge along the way.
  3. There's clearly going to be overlap between a Postgres impl and Redshift, but perhaps not as much as one might hope. ibis.sql.ddl already handles a lot of the "standard" SQL statements, and I think there will be a good amount of divergence when it comes to PG vs. Redshift. One option is starting with both ibis.sql.redshift and ibis.sql.postgres. Any statements that work for both live in ibis.sql.postgres.ddl, and ibis.sql.redshift.ddl classes inherit from postgres. For example, INSERT and DELETE behavior is more or less 1:1 between the two.
  4. I have no idea how to handle UDFs in Redshift right now without native support (which is supposedly coming). I'll have to put some thought into this.
  5. Where to start? After a little reading it seems like the most obvious place to start work is ibis.sql.redshift.ddl/ibis.sql.postgres.ddl that largely mirrors https://github.com/cloudera/ibis/blob/master/ibis/impala/ddl.py for consistency. Get a test suite working that creates, drops, and deletes from tables, then move on to Redshift-specific implementations of the trickier bits like date handling. I think that ibis.sql.ddl should largely work as-is for select/where/group-by, but I'll need to confirm with a little bit of testing first.
@wesm wesm added the redshift label Aug 18, 2015
@wesm
Copy link
Member

wesm commented Aug 18, 2015

hey Rob, short answers for now

  1. Yes, but there will be a good amount of refactoring to do (which I can help with to speed things along).
  2. Happy to have incremental merges where all the unit tests pass. The main question will be testing -- while for many tests it will be sufficient to test against a local postgres, for other Redshift-specific DDL you'll need a real instance to run the tests against.
  3. Main question here: can we use SQLAlchemy to make the select statements? One main reason I didn't do this from the get-go is that I need to support Impala complex types in the very near future. Or is there too much divergence in redshift? As an orthogonal issue, we'll need to wrap all the redshift built-in functions. Either way having an inheritance pattern makes sense.
  4. We can figure this out later, but it seems there is no UDF support at all at the moment: http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html
  5. I might suggest starting with ibis/sql/redshift/client.py and mirroring ibis/impala/client.py (we can refactor some code into ibis/client.py to foster API consistency where appropriate), which will be a pre-requisite for running any expressions to begin with. If using SQLAlchemy to interact with the DB using psycopg2 makes more sense then let's try to do that

It would be a good idea to start some documents to help organize the process, especially to make a catalog / table of redshift built-ins so that we can begin mapping those onto ibis/expr/operations.py and existing code-generation functions in ibis/sql/exprs.py (which will have quite a few differences with postgres built-ins, I suspect). I haven't explored how to plug in new functions into SQLAlchemy so that's another thing to look into.

I'm interested in having postgres support, anyway, so hopefully this will get two birds with one stone

@wesm
Copy link
Member

wesm commented Aug 18, 2015

We should definitely start a redshift/postgres branch until we get the bare essentials bootstrapped. After that we can incrementally merge new functionality

@wesm
Copy link
Member

wesm commented Aug 18, 2015

there's this https://pypi.python.org/pypi/redshift-sqlalchemy which may help also

@wesm
Copy link
Member

wesm commented Aug 29, 2015

@wrobstory I'm about to drop a generic sqlalchemy backend in #585. I'm going to wrap as much of SQLite as I can, and you'll want to follow the same recipe to provide redshift support. Main work that will be required is wrapping all the redshift built-in functions and expose them internally to the SQLAlchemy translator (I'll try to give clear guidance on how to go about thsi incrementally)

@wrobstory
Copy link
Author

Wow, nice work.

An hour after I said "I've got some time", a new project came up that pulled me away. That being said, we've got folks working on some redshift_sqlalchemy stuff: sqlalchemy-redshift/sqlalchemy-redshift#17

That includes Simple sponsoring a Redshift instance to test against. I think we might be able to support Ibis tests against it as well, if that's of interest.

@wesm
Copy link
Member

wesm commented Sep 3, 2015

Cool, now that SQLAlchemy translation support has dropped, adding a preliminary Redshift client would be relatively easy. Do you have a test instance you can give me access to so I can get this bootstrapped sometime in the next few weeks?

@cpcloud
Copy link
Member

cpcloud commented Jan 12, 2017

@wrobstory Curious if you'd be willing to throw ibis at redshift with the postgres backend :)

@cpcloud cpcloud added this to the Future milestone Jun 26, 2017
@cpcloud cpcloud added the feature Features or general enhancements label Jun 26, 2017
@datapythonista datapythonista removed this from the Future milestone Nov 13, 2020
@cpcloud
Copy link
Member

cpcloud commented Dec 7, 2021

This needs a champion. Perhaps the folks at Amazon would be interested in this!

@cpcloud cpcloud closed this as completed Dec 7, 2021
@ccjeff
Copy link

ccjeff commented Sep 24, 2024

New comer to the project and am wondering if people are still interested in this. Done a little digging and it seems like Redshift is not a supported backend. Would love to have it supported. I can attribute some time to maybe test the POC of it? Thanks!

@lostmygithubaccount
Copy link
Member

@ccjeff would you mind opening a new issue for discussion? Redshift is something we're definitely interested in supporting. one of the main considerations is the cloud infrastructure we'd need to run it in CI, though if there's any work and testing you'd like to do we'd certainly be interested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

No branches or pull requests

6 participants