-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add direct GCS export to DatabricksSqlOperator with Parquet/Avro support #60543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add direct GCS export to DatabricksSqlOperator with Parquet/Avro support #60543
Conversation
bad5f46 to
2a920d0
Compare
2a920d0 to
77a6ba5
Compare
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the PR, LGTM overall.
providers/databricks/src/airflow/providers/databricks/operators/databricks_sql.py
Outdated
Show resolved
Hide resolved
providers/databricks/src/airflow/providers/databricks/operators/databricks_sql.py
Outdated
Show resolved
Hide resolved
providers/databricks/src/airflow/providers/databricks/operators/databricks_sql.py
Outdated
Show resolved
Hide resolved
|
After this PR, the Databricks provider will depend on the GCP provider. Eventually, the Databricks provider will depend on all three cloud providers (AWS, Azure, and GCP), right? |
|
I'm wondering we could move this kind of common serialization logic to airflow/providers/google/src/airflow/providers/google/cloud/transfers/sql_to_gcs.py Lines 377 to 385 in 26a9d3b
airflow/providers/amazon/src/airflow/providers/amazon/aws/transfers/sql_to_s3.py Lines 234 to 255 in 26a9d3b
|
77a6ba5 to
27b1e3f
Compare
Yes, I agree I think this can be possibly be opened up as another issue? |
Yep, that's right |
|
I'm not quite sure why the Docker build test is failing. The errors show Microsoft's apt repository returning 403 Forbidden during apt-get update, which appears unrelated to the Databricks provider changes |
|
Restarted, was a problem in the backend. Assuming CI will turn green in a moment. |
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree I think this can be possibly be opened up as another issue?
Yes, it's non-blocking. We could just create issue to track it as follow-up.
27b1e3f to
9555bd1
Compare
|
I'm getting build failure on #60719 https://github.com/apache/airflow/actions/runs/21098174150/job/60678759100?pr=60719#step:8:1265 |
|
@jason810496 For the build issue Elad mentioned, the error shows fastavro 1.9.4 uses deprecated C APIs that were removed in Python 3.13. Would bumping fastavro up to >=1.10.0 work? |
|
Attempted fix PR: #60732 |
Adds direct GCS export capability to
DatabricksSqlOperatorwith Parquet and Avro format support.closes: #55128
Changes
parquetandavroto supportedoutput_formatvaluesgs://bucket/path) inoutput_pathparametergcp_conn_id,gcs_impersonation_chain[gcs]dependency for Google provider