Skip to content

Conversation

@maggesssss
Copy link
Contributor

closes: #23666

This PR includes a new Transfer Operator that reads a CSV File from S3 Storage and loads it into an existing Table of a generic SQL Database

I used csv.reader to read the file and insert_rows method of the existing DbApiHook.
Due to the fact that csv.reader is not reading the complete file into the memory, also large files can be loaded somehow efficiently.

I am happy for any feedback.

This PR replaces #28964

@potiuk
Copy link
Member

potiuk commented Jan 22, 2023

static checks are failing - I recommnd installing pre-commit.

@maggesssss
Copy link
Contributor Author

static checks are failing - I recommnd installing pre-commit.

Fixed now

root and others added 20 commits January 23, 2023 21:55
parameter which allows the user to add a custom parser.
Example parser added to docs

removed following args:
- csv_reader_kwargs
- skip_first_row
- column_list "infer" option
These arguments are not working with a customer parser at the moment

Changed to NamedTempoaryFile

Added s3_hook.get_key before downloading to check if file exists

Updated test and docs
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Niko Oliveira <onikolas@amazon.com>
to cached property db_hook
use imported watcher task
for SqlExecuteQueryOperators
string and added

SQLTableCheckOperator to check if
lines have been successfully imported
(without insert_rows method)
and optimized db_hook property
removed type hint from db_hook cached_property
to the return value of BaseHook.get_hook
import of get_test_run should be done at the bottom according to AIP-47
@potiuk potiuk force-pushed the feat_s3_to_sql_transfer_NEW branch from a0b2730 to 02ee9a6 Compare January 23, 2023 20:55
@o-nikolas o-nikolas merged commit efaed34 into apache:main Jan 23, 2023
@maggesssss maggesssss deleted the feat_s3_to_sql_transfer_NEW branch February 10, 2023 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add transfers operator S3 to SQL / SQL to SQL

4 participants