-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add transfer operator S3 to (generic) SQL #29085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
o-nikolas
merged 23 commits into
apache:main
from
maggesssss:feat_s3_to_sql_transfer_NEW
Jan 23, 2023
Merged
Add transfer operator S3 to (generic) SQL #29085
o-nikolas
merged 23 commits into
apache:main
from
maggesssss:feat_s3_to_sql_transfer_NEW
Jan 23, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
|
static checks are failing - I recommnd installing pre-commit. |
vincbeck
approved these changes
Jan 23, 2023
Contributor
Author
Fixed now |
parameter which allows the user to add a custom parser. Example parser added to docs removed following args: - csv_reader_kwargs - skip_first_row - column_list "infer" option These arguments are not working with a customer parser at the moment Changed to NamedTempoaryFile Added s3_hook.get_key before downloading to check if file exists Updated test and docs
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Niko Oliveira <onikolas@amazon.com>
to cached property db_hook
use imported watcher task
for SqlExecuteQueryOperators
downloaded
string and added SQLTableCheckOperator to check if lines have been successfully imported
(without insert_rows method) and optimized db_hook property
removed type hint from db_hook cached_property
to the return value of BaseHook.get_hook
import of get_test_run should be done at the bottom according to AIP-47
BaseHook instead
a0b2730 to
02ee9a6
Compare
o-nikolas
approved these changes
Jan 23, 2023
58 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #23666
This PR includes a new Transfer Operator that reads a CSV File from S3 Storage and loads it into an existing Table of a generic SQL Database
I used csv.reader to read the file and insert_rows method of the existing DbApiHook.
Due to the fact that csv.reader is not reading the complete file into the memory, also large files can be loaded somehow efficiently.
I am happy for any feedback.
This PR replaces #28964