Skip to content

[Draft] Initial Redshift Driver Implementation #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Conversation

eliasdefaria
Copy link
Collaborator

@eliasdefaria eliasdefaria commented Apr 17, 2025

Putting up a draft for what I was able to get done so far at the Seattle on-site!

Supported Types

(Redshift Type -> Arrow Type)
int2 -> int16
int/int4 -> Int32
int8 -> Int64
float4 -> float32
float/float8 -> float64
bool -> boolean
char/varchar/text -> string
date -> date32
time/timetz -> time64us
timestamp/timestamptz -> timestamp_us
decimal/numeric -> decimal128

Things I would like to do but didn't get a chance to:

  • Additional testing to make sure all the data types are handled correctly
  • More robust casting of data types in the record reader when they're passed to the builders
  • Testing
  • Adjusting credentials handling to be easier to understand and set. This part is a bit of a challenge to get right since different auth methods require different inputs and therefore options.
  • Optimizing performance of the record reader. Would like to also clean up the code a bit here as part of that effort. Because the schema comes with the first batch of records, we can do a lock-step sort of synchronization that Felippe implemented really nicely for us in Databricks

Some interesting tidbits that were discovered:

Cheers,
-Jason :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants