[Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog #2674

bayees · 2025-05-23T11:24:32Z

Description

Create comments and tags for tables and columns in Unity Catalog tables. Also adding primary key and foreign key constraints using the primary_key and references hints.

I have added the create_indexes to enable primary and foreign key to keep backwards compatible

There is a new Databricks adapter to use the new x-databricks-cluster, x-databricks-table-comment, x-databricks-table-tags, x-databricks-column-comment, x-databricks-column-tags hints.

The table and column comments using description hints if those are available, but overrides those if the hints from the databricks adapter i given.

Related Issues

Closes [Databricks] Add comments and tags hints for table and column #2625

Additional Context

…ing support for clustering, table comments, and tags.

…and foreign key constraints, and table options. Update SQL generation for table alterations to include comments and tags.

… hints, along with examples for using the databricks_adapter. Enhance clarity on applying hints for resource metadata and constraints.

…s adapter. Improve logging for table options during ALTER TABLE operations and streamline SQL generation for constraints.

…OREIGN KEY constraints. Introduce create_indexes option in DatabricksClientConfiguration and update related classes to handle index creation logic. Add new package-lock.yml for dbt_transform examples.

…d dependencies.

…ed dependencies.

…r enforcing PRIMARY KEY and FOREIGN KEY constraints on tables.

netlify · 2025-05-23T11:24:37Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`f1d93ab`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/6878c376dc24cb000847c079
😎 Deploy Preview	https://deploy-preview-2674--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

fix markdown tip

rudolfix · 2025-06-02T10:52:43Z

@bayees this looks good! are you able to fix linting errors and add a basic test? like we do ie. for bigquery adapter. also we can take over this PR as we are allowed to push to your branch. ping us what's your plan

…ents and table comments. Clean up unnecessary whitespace in the code for better readability.

bayees · 2025-06-02T15:22:22Z

@rudolfix Linting is done.

I am not sure about the test. Can you point to the specific Bigquery test?

rudolfix · 2025-06-05T19:15:36Z

Hey! BigQuery tests are in tests/load/bigquery/test_biguery_table_builder.py. example test checking if partition is right,

def test_bigquery_partition_by_integer(
    destination_config: DestinationTestConfiguration,
) -> None:
    pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", dev_mode=True)

    @dlt.resource(
        columns={"some_int": {"data_type": "bigint", "partition": True, "nullable": False}},
    )
    def demo_resource() -> Iterator[Dict[str, int]]:
        for i in range(10):
            yield {
                "some_int": i,
            }

    @dlt.source(max_table_nesting=0)
    def demo_source() -> DltResource:
        return demo_resource

    pipeline.run(demo_source())

    with pipeline.sql_client() as c:
        with c.execute_query(
            "SELECT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.PARTITIONS WHERE partition_id IS NOT"
            " NULL);"
        ) as cur:
            has_partitions = cur.fetchone()[0]
            assert isinstance(has_partitions, bool)
            assert has_partitions

note that:

    with pipeline.sql_client() as c:
        databricks = c.native_connection

gives you access to native databricks client

you could place your tests in tests/load/databricks/test_databricks_adapter.py. also we do not need tests that are as comprehensive as BigQuery but a basic check if hints are translated in the right props in Unity Catalog would be cool.

regarding credentials:
you can place them into tests/.dlt/secrets.toml tests will find it. here are ours (without secrets). I assume you have access to the cluster?

[destination.databricks.credentials]
server_hostname = "adb-8001321225760611.11.azuredatabricks.net"
http_path = "/sql/protocolv1/o/8001321225760611/0124-183359-ghw1vo3b"
access_token = "..."
catalog = "dlt_ci"
client_id = "..."
client_secret = "..."

one more thing: we broke our devel :/ so you'd need to merge it into your branch,

bayees · 2025-06-18T17:50:05Z

@rudolfix
I added the unit test to the new Databricks adapter, but I found a non-deterministic issue. When added foreign keys to the tables, the constraint requires the related table with the primary key to exist. Are there any method for controlling the other of table creation?

rudolfix · 2025-06-19T16:11:52Z

@bayees here's PR #2791 that allows to add statements after table create/alter. you should override _get_table_post_update_sql , and move generation of FOREIGN REFERENCES there. We'll merge this PR asap and then you can merge it into your branch.

In the meantime we updated our dependency system to uv and we have new test workflows on ci. So overall things are better and much faster but please take a look at our CONTRIBUTING guide. Thanks!

…ng and improve table update SQL generation. Removed redundant primary key constraint logic and added a new method for generating foreign key constraints post table update.

bayees · 2025-06-20T05:56:55Z

@rudolfix Updated the fork to reflect the changes in #2791. Waiting for it to be merge.

Nice with uv. 🥇

burnash · 2025-06-20T11:13:32Z

@bayees thanks so much. I've reviewed #2791, as soon as the CI pass we'll merge it and I'll review this PR

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>

… hints. Introduced detailed descriptions for `column_comment` and `column_tags` to improve usability and clarity for users implementing schema migrations.

…. Updated SQL statements for table and column comments, as well as tags, to ensure proper handling of special characters. Refactored to use qualified table names for improved accuracy in SQL commands.

…"Suported" to "Supported" and updated references to `column_comment` and `table_comment` for consistency. Improved formatting of hints section for better readability.

…id table and column tags, table comments, and cluster types. Introduced parameterized tests for various invalid inputs and ensured proper exception raising. Enhanced test coverage for special character handling in comments and tags.

…` option for enforcing PRIMARY KEY and FOREIGN KEY constraints, and add a reference to the Databricks adapter section for additional hints.

bayees · 2025-06-27T05:56:10Z

Hey @burnash. Thank you for the review. I have implemented the changes suggested.

Please take a look at it and let me know if there are additional changes needed

burnash · 2025-07-09T22:37:13Z

docs/website/docs/dlt-ecosystem/destinations/databricks.md

+
+Databricks supports the following table hints:
+
+- `description` - Uses the description to add comment to the table. This can also be done by using the adapter method `table_comment`.


by using the adapter method table_comment.

@bayees could you clarify if you mean "adapter method table_comment" or "adapter parameter table_comment"?

burnash · 2025-07-09T22:37:27Z

docs/website/docs/dlt-ecosystem/destinations/databricks.md

+Databricks supports the following column hints:
+
+- `primary_key` - adds a primary key constraint to the column in Unity Catalog. 
+- `description` - adds a description to the column. This can also be done by using the adapter method `table_comment`.


dlt/destinations/impl/databricks/configuration.py

burnash

Looks very good, @bayees, thank you for the documentation and tests. Have a couple of minor comments, please take a look.

bayees · 2025-07-10T04:06:56Z

This was the recommendation from @rudolfix. Do you have input here?

burnash · 2025-07-10T08:28:18Z

This was the recommendation from @rudolfix.

@bayees just to clarify, which of the comments you're referring to?

nicor88 · 2025-07-10T09:30:16Z

Thanks @bayees for this PR - for what I understand, this PR doesn't cover the possibility of marking a column a partition, adding z-order for a column, or liquid clustering, this will be a relevant feature to add, which is pretty relevant for teams that care about "query optimization" and cost control.

I don't think that is a good idea to add the above feature in this PR, but might be relevant to track that in other issue or then have other code changes related to it, WDYT @burnash ?

burnash · 2025-07-10T15:07:56Z

@bayees please take a look at documentation comments as well. Let me know if you need help with fixing mypy on CI. We'd need to make sure the tests pass before we merge the PR.

burnash · 2025-07-10T15:09:11Z

@nicor88 thanks for the suggestion. Agreed, it's best to open a new related issue for this.

burnash · 2025-07-15T10:05:59Z

Hi @bayees @christian-bay-backstage, just following up to see if you'd like to continue working on this PR or if you'd prefer that I take it from here. Thanks again for your effort!

bayees · 2025-07-15T16:36:31Z

Hi @bayees @christian-bay-backstage, just following up to see if you'd like to continue working on this PR or if you'd prefer that I take it from here. Thanks again for your effort!

I am currently on vacation, so if the only things to change is the phrasing in the documentation and fix the mypy, then please feel free to fix. Otherwise I will look into it in a week or two.

Minor docs adjustments for clarity on adapter parameters.

burnash

@bayees thank you very much for the contribution.

…a hints (#2674) Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>

bayees added 10 commits May 18, 2025 06:16

add databricks adapter to destinations list

8fe1016

Implement Databricks adapter for data preparation and loading, includ…

dbbdcbf

…ing support for clustering, table comments, and tags.

Enhance Databricks adapter with support for column comments, primary …

9e43590

…and foreign key constraints, and table options. Update SQL generation for table alterations to include comments and tags.

Update Databricks documentation to include supported column and table…

62db898

… hints, along with examples for using the databricks_adapter. Enhance clarity on applying hints for resource metadata and constraints.

Refactor primary key and foreign key constraint handling in Databrick…

6680113

…s adapter. Improve logging for table options during ALTER TABLE operations and streamline SQL generation for constraints.

Enhance Databricks adapter configuration to support PRIMARY KEY and F…

9bb9c06

…OREIGN KEY constraints. Introduce create_indexes option in DatabricksClientConfiguration and update related classes to handle index creation logic. Add new package-lock.yml for dbt_transform examples.

Merge branch 'dlt-hub:devel' into exp/add-databricks-metadata

efcbd00

Remove package-lock.yml for dbt_transform examples, cleaning up unuse…

38e75dd

…d dependencies.

Remove package-lock.yml from dbt_transform examples to eliminate unus…

0a30998

…ed dependencies.

Update Databricks documentation to include create_indexes option fo…

01fe4ac

…r enforcing PRIMARY KEY and FOREIGN KEY constraints on tables.

bayees changed the title ~~Exp/add databricks metadata~~ [Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog May 23, 2025

bayees added 2 commits May 23, 2025 13:35

Update databricks.md

30e5901

Merge pull request #1 from bayees/patch-1

b713f85

fix markdown tip

bayees added 2 commits June 2, 2025 16:08

Refactor Databricks adapter to improve SQL generation for column comm…

ccd13ed

…ents and table comments. Clean up unnecessary whitespace in the code for better readability.

Merge branch 'dlt-hub:devel' into exp/add-databricks-metadata

9be9553

Added test and cleaned

8e45af8

rudolfix assigned rudolfix and burnash and unassigned rudolfix Jun 16, 2025

burnash added the enhancement New feature or request label Jun 16, 2025

bayees added 2 commits June 20, 2025 07:41

Merge branch 'dlt-hub:devel' into exp/add-databricks-metadata

3e3c093

Refactor DatabricksClient to streamline foreign key constraint handli…

cffdb45

…ng and improve table update SQL generation. Removed redundant primary key constraint logic and added a new method for generating foreign key constraints post table update.

bayees and others added 8 commits June 26, 2025 08:34

Update docs/website/docs/dlt-ecosystem/destinations/databricks.md

59ec01e

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>

Update docs/website/docs/dlt-ecosystem/destinations/databricks.md

79ca320

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>

Merge branch 'dlt-hub:devel' into exp/add-databricks-metadata

befa217

Enhance Databricks adapter documentation by adding support for column…

a1a90e1

… hints. Introduced detailed descriptions for `column_comment` and `column_tags` to improve usability and clarity for users implementing schema migrations.

Enhance DatabricksClient SQL generation by escaping comments and tags…

2509de9

…. Updated SQL statements for table and column comments, as well as tags, to ensure proper handling of special characters. Refactored to use qualified table names for improved accuracy in SQL commands.

Fix typos and enhance clarity in Databricks documentation. Corrected …

5038768

…"Suported" to "Supported" and updated references to `column_comment` and `table_comment` for consistency. Improved formatting of hints section for better readability.

Update Databricks documentation to clarify the use of `create_indexes…

c35b093

…` option for enforcing PRIMARY KEY and FOREIGN KEY constraints, and add a reference to the Databricks adapter section for additional hints.

burnash reviewed Jul 9, 2025

View reviewed changes

dlt/destinations/impl/databricks/configuration.py Show resolved Hide resolved

burnash reviewed Jul 9, 2025

View reviewed changes

nicor88 mentioned this pull request Jul 11, 2025

[Databricks destination] Support for table optimization techniques in Databricks: partitioning, liquid clustering, data-skipping #2863

Closed

burnash added 4 commits July 16, 2025 15:31

Fix mypy & format

7ffda6f

Merge branch 'devel' into exp/add-databricks-metadata

19106c9

update Databricks adapter to use new typing for column hints and schemas

ffcd20d

Minor docs adjustments for clarity on adapter parameters.

Merge branch 'devel' into exp/add-databricks-metadata

f1d93ab

burnash self-requested a review July 17, 2025 09:34

burnash approved these changes Jul 18, 2025

View reviewed changes

burnash merged commit 526a630 into dlt-hub:devel Jul 18, 2025
81 of 83 checks passed

zilto pushed a commit that referenced this pull request Jul 22, 2025

[databricks] Addi comment, tags and pk/fk constraints configurable vi…

04bbef5

…a hints (#2674) Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>

bayees deleted the exp/add-databricks-metadata branch September 21, 2025 17:42


		Databricks supports the following table hints:

		- `description` - Uses the description to add comment to the table. This can also be done by using the adapter method `table_comment`.

[Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog #2674

[Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog #2674

Uh oh!

Conversation

bayees commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Additional Context

Uh oh!

netlify bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dlt-hub-docs ready!

Uh oh!

rudolfix commented Jun 2, 2025

Uh oh!

bayees commented Jun 2, 2025

Uh oh!

rudolfix commented Jun 5, 2025

Uh oh!

bayees commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudolfix commented Jun 19, 2025

Uh oh!

bayees commented Jun 20, 2025

Uh oh!

burnash commented Jun 20, 2025

Uh oh!

bayees commented Jun 27, 2025

Uh oh!

burnash Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burnash Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

burnash left a comment

Choose a reason for hiding this comment

Uh oh!

bayees commented Jul 10, 2025

Uh oh!

burnash commented Jul 10, 2025

Uh oh!

nicor88 commented Jul 10, 2025

Uh oh!

burnash commented Jul 10, 2025

Uh oh!

burnash commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burnash commented Jul 15, 2025

Uh oh!

bayees commented Jul 15, 2025

Uh oh!

burnash left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bayees commented May 23, 2025 •

edited

Loading

netlify bot commented May 23, 2025 •

edited

Loading

bayees commented Jun 18, 2025 •

edited

Loading

burnash Jul 9, 2025 •

edited

Loading

burnash commented Jul 10, 2025 •

edited

Loading

burnash left a comment •

edited

Loading