-
Notifications
You must be signed in to change notification settings - Fork 356
[Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing support for clustering, table comments, and tags.
…and foreign key constraints, and table options. Update SQL generation for table alterations to include comments and tags.
… hints, along with examples for using the databricks_adapter. Enhance clarity on applying hints for resource metadata and constraints.
…s adapter. Improve logging for table options during ALTER TABLE operations and streamline SQL generation for constraints.
…OREIGN KEY constraints. Introduce create_indexes option in DatabricksClientConfiguration and update related classes to handle index creation logic. Add new package-lock.yml for dbt_transform examples.
…r enforcing PRIMARY KEY and FOREIGN KEY constraints on tables.
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
fix markdown tip
|
@bayees this looks good! are you able to fix linting errors and add a basic test? like we do ie. for bigquery adapter. also we can take over this PR as we are allowed to push to your branch. ping us what's your plan |
…ents and table comments. Clean up unnecessary whitespace in the code for better readability.
|
@rudolfix Linting is done. I am not sure about the test. Can you point to the specific Bigquery test? |
|
Hey! BigQuery tests are in def test_bigquery_partition_by_integer(
destination_config: DestinationTestConfiguration,
) -> None:
pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", dev_mode=True)
@dlt.resource(
columns={"some_int": {"data_type": "bigint", "partition": True, "nullable": False}},
)
def demo_resource() -> Iterator[Dict[str, int]]:
for i in range(10):
yield {
"some_int": i,
}
@dlt.source(max_table_nesting=0)
def demo_source() -> DltResource:
return demo_resource
pipeline.run(demo_source())
with pipeline.sql_client() as c:
with c.execute_query(
"SELECT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.PARTITIONS WHERE partition_id IS NOT"
" NULL);"
) as cur:
has_partitions = cur.fetchone()[0]
assert isinstance(has_partitions, bool)
assert has_partitionsnote that: with pipeline.sql_client() as c:
databricks = c.native_connectiongives you access to native databricks client you could place your tests in regarding credentials: [destination.databricks.credentials]
server_hostname = "adb-8001321225760611.11.azuredatabricks.net"
http_path = "/sql/protocolv1/o/8001321225760611/0124-183359-ghw1vo3b"
access_token = "..."
catalog = "dlt_ci"
client_id = "..."
client_secret = "..."one more thing: we broke our devel :/ so you'd need to merge it into your branch, |
|
@rudolfix |
|
@bayees here's PR #2791 that allows to add statements after table create/alter. you should override In the meantime we updated our dependency system to |
…ng and improve table update SQL generation. Removed redundant primary key constraint logic and added a new method for generating foreign key constraints post table update.
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
… hints. Introduced detailed descriptions for `column_comment` and `column_tags` to improve usability and clarity for users implementing schema migrations.
…. Updated SQL statements for table and column comments, as well as tags, to ensure proper handling of special characters. Refactored to use qualified table names for improved accuracy in SQL commands.
…"Suported" to "Supported" and updated references to `column_comment` and `table_comment` for consistency. Improved formatting of hints section for better readability.
…id table and column tags, table comments, and cluster types. Introduced parameterized tests for various invalid inputs and ensured proper exception raising. Enhanced test coverage for special character handling in comments and tags.
…` option for enforcing PRIMARY KEY and FOREIGN KEY constraints, and add a reference to the Databricks adapter section for additional hints.
|
Hey @burnash. Thank you for the review. I have implemented the changes suggested. Please take a look at it and let me know if there are additional changes needed |
|
|
||
| Databricks supports the following table hints: | ||
|
|
||
| - `description` - Uses the description to add comment to the table. This can also be done by using the adapter method `table_comment`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by using the adapter method
table_comment.
@bayees could you clarify if you mean "adapter method table_comment" or "adapter parameter table_comment"?
| Databricks supports the following column hints: | ||
|
|
||
| - `primary_key` - adds a primary key constraint to the column in Unity Catalog. | ||
| - `description` - adds a description to the column. This can also be done by using the adapter method `table_comment`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, @bayees, thank you for the documentation and tests. Have a couple of minor comments, please take a look.
|
This was the recommendation from @rudolfix. Do you have input here? |
|
Thanks @bayees for this PR - for what I understand, this PR doesn't cover the possibility of marking a column a partition, adding z-order for a column, or liquid clustering, this will be a relevant feature to add, which is pretty relevant for teams that care about "query optimization" and cost control. I don't think that is a good idea to add the above feature in this PR, but might be relevant to track that in other issue or then have other code changes related to it, WDYT @burnash ? |
|
@bayees please take a look at documentation comments as well. Let me know if you need help with fixing mypy on CI. We'd need to make sure the tests pass before we merge the PR. |
|
@nicor88 thanks for the suggestion. Agreed, it's best to open a new related issue for this. |
|
Hi @bayees @christian-bay-backstage, just following up to see if you'd like to continue working on this PR or if you'd prefer that I take it from here. Thanks again for your effort! |
I am currently on vacation, so if the only things to change is the phrasing in the documentation and fix the mypy, then please feel free to fix. Otherwise I will look into it in a week or two. |
Minor docs adjustments for clarity on adapter parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bayees thank you very much for the contribution.
…a hints (#2674) Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
Description
Create comments and tags for tables and columns in Unity Catalog tables. Also adding primary key and foreign key constraints using the primary_key and references hints.
I have added the create_indexes to enable primary and foreign key to keep backwards compatible
There is a new Databricks adapter to use the new x-databricks-cluster, x-databricks-table-comment, x-databricks-table-tags, x-databricks-column-comment, x-databricks-column-tags hints.
The table and column comments using description hints if those are available, but overrides those if the hints from the databricks adapter i given.
Related Issues
Additional Context